Acoustic Frequency-Based Approach for Identification of Photoacoustic Surgical Biomarkers

Spectral unmixing techniques for photoacoustic images are often used to isolate signal origins (e.g., blood, contrast agents, lipids). However, these techniques often require many (e.g., 12–59) wavelength transmissions for optimal performance to exploit the optical properties of different biological chromophores. Analysis of the acoustic frequency response of photoacoustic signals has the potential to provide additional discrimination of photoacoustic signals from different materials, with the added benefit of potentially requiring only a few optical wavelength emissions. This study presents our initial results testing this hypothesis in a phantom experiment, given the task of differentiating photoacoustic signals from deoxygenated hemoglobin (Hb) and methylene blue (MB). Coherence-based beamforming, principal component analysis, and nearest neighbor classification were employed to determine ground-truth labels, perform feature extraction, and classify image contents, respectively. The mean ± one standard deviation of classification accuracy was increased from 0.65 ± 0.16 to 0.88 ± 0.17 when increasing the number of wavelength emissions from one to two, respectively. When using an optimal laser wavelength pair of 710–870 nm, the sensitivity and specificity of detecting MB over Hb were 1.00 and 1.00, respectively. Results are highly promising for the differentiation of photoacoustic-sensitive materials with comparable performance to that achieved with more conventional multispectral laser wavelength approaches.

Existing spectral unmixing techniques generally consist of generating an overdetermined system of equations (i.e., more equations than variables) from the signal response of each chromophore at different laser wavelengths, which can then be solved with an optimization technique based on the known optical absorption coefficient for each chromophore at each wavelength. For example, Xia et al. (2016) used a pseudo inverse approach to differentiate photoacoustic responses originating from water, blood, and lipids. Ding et al. (2017) investigated the effect of alternative versions with nonnegativity constraints to determine the concentration levels of contrast agent injected in in vivo mice. More recently, Grasso et al. (2020) proposed an iterative approach to discriminate blood oxygenation levels by solving the system of equations with a nonnegative matrix factorization, which compensates for the illconditioned invertibility of the absorption coefficient matrix.
Despite their effectiveness, these spectral unmixing techniques are typically not feasible for most real-time applications because of the lengthy acquisition times associated with transmitting multiple laser wavelengths to achieve a single estimate. Traditional spectral unmixing techniques also do not typically consider differences in acoustic spectra, which has the potential to provide additional information for differentiation between biomarkers or different soft tissues.
An alternative to optimization techniques is to consider an analysis of the acoustic spectra using spectral parameters. Initially, spectral parameters obtained from photoacoustic signals were used for characterization of tissues. For example, Kumon et al. (2011) conducted an in vivo study to detect prostate adenocarcinomas using the intercept, slope, and mid-band fit of the frequency response of photoacoustic RF signals, where the use of mid-band fit resulted in statistically significant differentiation between pathological and healthy tissue (p < 0.01). Similarly, Strohm et al. (2013) used both the slope of a linear fit and the spectral peak to discriminate between concentrations of red blood cells. Later, Wang et al. (2015) used the slope parameter to accurately differentiate (p < 0.01) the photoacoustic signals from particles of different diameters in phantom experiments. However, by reducing the dimensionality of the feature space, spectral parameter methods provide a limited snapshot of frequency characteristics. In addition, these methods use a calibration stage from a reference spectra whose source varies among studies (e.g., hair fibers (Kumon et al., 2011), stainless steel blocks (Cao et al., 2017), gold-films (Strohm et al., 2013), and black-dyed polymer micro-spheres (Wang et al., 2015)), which limits the repeatability of classification performance for in vivo applications.
In contrast to spectral unmixing methods, two distinct approaches (i.e., F-mode imaging (Moore et al., 2019) and a method proposed by Cao et al. (Cao et al., 2017)) utilize the complete acoustic spectra for differentiation of photoacoustic targets. F-mode imaging (Moore et al., 2019) consists of dividing the spectra with filter banks and displaying a series of images of a specific frequency content, which are later combined with a labelfree photoacoustic microscopy (PAM) map to selectively enhance the visualization of organelles. The method proposed by Cao et al. (2017) uses the acoustic spectra filtered with the frequency response of the ultrasound transducer to perform k-means clustering of photoacoustic signals originating from two different photoacoustic-sensitive materials. These two approaches share two limitations. First, in contrast to spectral unmixing techniques, labelled regions for each desired chromophore are required. Second, these labelled regions rely on a priori information about the location of materials to be differentiated. These limitations are not ideal for image guidance during surgical interventions and reduce overall classification performance.
To overcome these challenges with traditional spectral unmixing (Xia et al., 2016;Grasso et al., 2020), F-mode (Moore et al., 2019) and k-means clustering (Cao et al., 2017), we propose a novel, more general acoustic frequency-based analysis method to discriminate photoacoustic responses from different materials. The proposed method does not depend on a reference spectrum (as opposed to k-means clustering (Cao et al., 2017)). In addition, the proposed method applies a classification framework using training and testing sets containing known photoacoustic-sensitive materials (i.e., no a priori signal location information is required, unlike F-mode (Moore et al., 2019) and traditional spectral unmixing techniques (Xia et al., 2016;Grasso et al., 2020)). We hypothesize that our proposed method, which relies on an analysis of the acoustic frequency response from a single-or dual-wavelength emission, is sufficient to differentiate biomarkers and has the potential to increase possible frame rates for real-time implementation in the operating room.
To test our hypothesis, a frequency analysis was applied to the received photoacoustic signals from two chromophores-blood and methylene blue. The necessity to differentiate these two chromophores is motivated by recently proposed photoacoustic-guided hysterectomy techniques that require differentiation of uterine arteries from ureters containing methylene blue . Although the focus of this paper is the distinction of these two chromophores, the proposed photoacoustic differentiation is applicable to other chromophores of interest during a surgical procedure.
The remainder of this paper is organized as follows. Section 2 details acquisition, segmentation, and classification methods to identify photoacoustic signals originating from either methylene blue or blood, followed by summaries of existing methods used to benchmark the performance of our approach on the same datasets. Section 3 presents the quantitative and qualitative comparison of the classification performance between the proposed and the existing methods. Section 4 discusses insights from these results and Section 5 summarizes our conclusions.

Experimental Setup
We designed a phantom that mimics the clinical setup of photoacoustic catheter-based interventions, where an optical fiber is attached to a cardiac catheter as it is being inserted through a major vein (Graham et al., 2020a). Another possibility is that a contrast agent may be injected into this vein through the same catheter. Based on these details, Figure 1 shows the experimental setup used to differentiate the two photoacoustic-sensitive materials of hemoglobin and methylene blue discussed throughout this manuscript. A 29 cm × 18 cm × 10 cm (length × width × height) polyvinyl chloride-plastisol (PVCP) phantom was fabricated to contain ten cylindrical, hollow chambers. Each chamber had a diameter of 15 mm and a depth of 55 mm. Two of the chambers were filled with either a 100 μM aqueous solution of methylene blue (MB, ID: S25431A, Fisher Scientific, Waltham, MA) or human blood (Hb), and a 1-mm-diameter optical fiber was inserted in each of the filled chambers. These fibers originated from a bifurcated fiber bundle that was connected to a Phocus Mobile laser (Opotek Inc., Carlsbad, CA, United States), transmitting laser light with wavelengths ranging from 690 to 950 nm in 10 nm increments.
The tip of each optical fiber was positioned approximately 15 mm below the top surface of the chambers, and light was emitted from each fiber tip. By transmitting light locally into each chamber and not globally illuminating multiple chambers simultaneously, we minimized (or systematically controlled) fluence differences and the related amplitude of responses to the optical excitation. The generated photoacoustic signals were received by an Alpinion L3-8 linear array ultrasound probe (center frequency of 5.5 MHz and pitch of 0.3 mm) that was positioned on the lateral wall of the phantom, approximately 40 mm away from the hollow chamber cross section, as shown in Figure 1.
To evaluate the reproducibility of our proposed method, three datasets were acquired (i.e., Datasets 1, 2, and 3). Dataset 1 was acquired first followed by Dataset 2 (acquired 10 h later), followed by Dataset 3 (acquired 13 h after Dataset 2). The Hb samples were stored with interim refrigeration at a temperature of 4°C. The fluence emitted from each fiber tip measured 1.0 mJ.
To characterize the effects of unequal fluence emitted from each fiber tip, three additional datasets were acquired with fluence pairs in the MB and Hb chambers recorded as 0.4 and 1.8 mJ, 1.0 and 1.8 mJ, and 0.4 and 1.0 mJ, respectively. These three datasets were labeled as "Fluence Pair 1", "Fluence Pair 2," and "Fluence Pair 3," respectively. Each fluence pair dataset comprised three subsets, acquired with the same time intervals described in the preceding paragraph for Datasets 1-3.
To evaluate the performance of our proposed method in more challenging environments, particularly in the presence of an aberrating media composed of mostly fatty tissue, three additional datasets were acquired with 2-, 5-, and 7 mm-thick layers of turkey bacon placed between the phantom and the ultrasound probe. This dataset was acquired immediately prior to Dataset 2, thus it is expected to contain similar Hb degradation to that of Dataset 2. The added tissue layers can be considered to represent the fat that is commonly located within skin and within the subcutaneous region of healthy human tissue (Hinkelman et al., 1998). The fluence emitted from each fiber tip measured 1.0 mJ. Figure 2 presents an overview of the proposed framework for differentiating photoacoustic signals sources. For each laser wavelength emission, 10 acquisitions of raw radiofrequency data were acquired. Photoacoustic images were then generated using conventional delay-and-sum (DAS) beamforming. Two regions of interest (ROIs) were automatically defined to separate photoacoustic signals generated from MB and Hb, located on the right and the left sides of the photoacoustic images, respectively. These ground-truth labels were automatically segmented using binary-thresholding of locally weighted short-lag spatial coherence (LW-SLSC) images (Gonzalez and Bell, 2018), with a regularization factor of α 1 and an axial correlation kernel of 0.56 mm. Binary segmentation was performed using a −10 dB threshold mask applied to the LW-SLSC images. A single binary mask was computed per laser wavelength emission, which was the result of the logical inclusive "OR" operation of the 10 masks obtained from the 10 frames. This segmentation resulted in two distinct signals on the left and right sides of the mask, which were assigned the ground-truth labels of MB and Hb, respectively. For each image, only those pixels included in the coherence masks were used for feature extraction, training, and classification.

Atlas of Photoacoustic-Sensitive Materials
A frequency analysis of the photoacoustic pressure waves was performed. For each material (i.e., MB and Hb), the normalized power spectra were calculated from a sliding window of axial kernels of in-phase and quadrature (IQ) data. Principal component analysis (PCA) was applied to the power spectra of photoacoustic signals acquired at each laser wavelength in order to reduce the complexity of the feature space. When using a training set, the principal components were stored in an "atlas" describing each material. Finally, when evaluating the spectra of a test signal, nearest neighbor (k-NN) classification was applied with the L2-norm as the measure of distance between the PCA of the test spectra and the PCA of the spectra within the atlas. Datasets 1-3 (described in Section 2.1) were utilized to compare the performance of the proposed atlas-based method with the performance of the existing classification methods described in Section 2.4. Because these atlas methods require a training set in addition to a test set, Dataset 1 was used for training the atlas methods when testing with Datasets 2 and 3. In addition, to include performance when testing with Dataset 1, Dataset 2 was arbitrarily chosen for training in this scenario. Figure 3 shows two proposed spectral analyses using either one or two wavelengths. In the dual-wavelength analysis, the magnitude of the IQ spectra of the photoacoustic response from a region of interest using two different wavelengths were concatenated, resulting in a region of interest producing a spectrum of size [1 × N] from one wavelength, where N is the number of samples of the spectrum, producing a concatenated spectra P of size [1 × 2N]. This concatenated spectrum was then normalized to its maximum value. No concatenation was required for the single-wavelength analysis.
The initial parameters of the single-and dual-wavelength atlas method used in our previous publication  were modified to maximize the sensitivity, specificity, and accuracy of our approach (see Section 2.6 for metric definitions). In particular, the parameters for in-phase quadrature demodulation, PCA, and k-NN were optimized through an iterative search using laser wavelengths ranging from 650 to 950 nm (see Section 1 of the Supplementary Material for more details). The optimized parameters were utilized for the proposed spectral analyses throughout the manuscript.

Spectral Unmixing Techniques
Spectral unmixing techniques solve the source component reconstruction C of a scanned region by using the multispectral measurement matrix M and an a priori absorption coefficient matrix S of the number of chromophores present in the image. A conventional spectral unmixing solution is the least square method presented below (Glatz et al., 2011;Taruttis et al., 2014;Xia et al., 2016): FIGURE 2 | Overview of proposed method to differentiate photoacoustic signal sources using acoustic frequency information. The blue and red coherence masks show regions of interest for methylene blue (MB) and blood (Hb), respectively. These regions are known for the training set and need to be correctly classified through atlas comparisons during testing. Spectra are asymmetric with respect to frequency because baseband signals were analyzed after IQ demodulation.
where C segments k number of chromophores over a grid of n pixels, m is the number of laser wavelength acquisitions, and S + is the Moore-Penrose pseudoinverse of the absorption coefficient matrix S (i.e., S + S T (SS T ) −1 ). A more robust approach proposed by Grasso et al. (2020) used an iterative non-negative matrix factorization (NNMF) to adjust the initial S and further reduce residual errors: where ⊗ denotes multiplication. The multiplication and division steps are considered element-wise operations and the stopping criteria is defined by an improvement tolerance η. Equations 1, 2 are considered overdetermined systems (i.e., m > k). Both methods were applied to DAS images of the testing data described in Section 2.2 and Figure 3 for two chromophores (i.e., k 2). However, because there is no report of absorption coefficients for methylene blue at optical wavelengths greater than 800 nm, only 12 of the 27 wavelength acquisitions were used for the construction of the matrix M (i.e., m 12). In addition, each equation was regularized by modifying the initial S to: where A is a matrix of ones, and c is an additive coefficient that was varied from 10 −1 to 10 3 cm −1 in multiplicative increments of 10 0.188 (i.e., c[n] 10 −1 × 10 0.188n ).

F-Mode Imaging
Using the testing data, non-normalized DAS images were generated without log-compression and segmented with the coherence mask described in Section 2.2. Log-compressed power spectra were calculated from a sliding window of axial kernels of radiofrequency signals, each 3.85 mm in length. For each spectrum, the integrated frequency content was estimated from four sectors of the frequency domain of 0.2 MHz width and center frequencies of 1, 2, 3, and 4 MHz. Then, for each segmented DAS image, k-means clustering was applied to separate MB and Hb axial kernels. This process was repeated for each single-wavelength acquisition and each frame. Given that the labelling provided by k-mean clustering is arbitrary for each instance of classification, the f value (see Eq. 7) was computed for both original labels (i.e., "1" Hb and "2" MB) and inverted labels (i.e., "1" MB and "2" Hb) for each testing frame. Then, the labelling convention that provided the highest f value was chosen as the final clustering result.

Acoustic-Based Clustering With Calibrated Spectra
To perform the acoustic-based clustering method reported by Cao et al. (Cao et al., 2017), the spectra of each radio-frequency axial kernel were first calibrated to a reference spectrum that models the characteristic frequency response of the ultrasound system. The experiment for determining the optimal reference spectra is detailed in Section 2 of the Supplementary Material. The generation of the log spectra was similar to that of F-mode imaging. However, each spectrum was then calibrated over the reference spectrum and then further normalized at 0 dB, as specified in (Cao et al., 2017). Finally, k-means clustering was conducted using the same labelling criteria as described in Section 2.4.2, and the process was repeated for each singlewavelength acquisition and each frame. For the remainder of this manuscript, we refer to this acoustic-based clustering method as the k-means clustering method. Table 1 summarizes the range of light-emission wavelengths used for each method as well as the corresponding hyperparameter to further improve the robustness of the classification performance. The additive coefficient c represents a trade off between classification performance and reproducibility for the spectral unmixing methods, as the condition number of matrix S′ increases when c increases, and the system becomes more illposed. The variation of the reference spectra evaluates the consistency of the classification results for the k-means clustering method when considering different materials and acquisition setups.

Classification Performance Metrics
MB and Hb were considered to be the positive and negative samples, respectively, when calculating sensitivity, specificity, and accuracy metrics of classification performance. Sensitivity or true positive rate (TPR) measures the fraction of pixels that were correctly classified as methylene blue: where T MB and F Hb are the number of true MB and false Hb pixels, respectively. Similarly, specificity or true negative rate (TNR) measures the fraction of pixels that are correctly classified as deoxygenated blood: where T Hb and F MB are the number of true Hb and false MB pixels, respectively. The combination of sensitivity and specificity is described by the accuracy metric, defined as: Accuracy ACC To determine the optimal parameter for each method, the three quantitative metrics were considered simultaneously, using the optimization expression: where λ is either: (1) a single wavelength for the single-wavelength atlas method, k-means clustering or F-mode imaging method; (2) a pair of wavelengths for the dual-wavelength atlas method; or (3) equivalent to the additive coefficient c for the spectral unmixing methods, whileλ is the optimal λ. Given that a large number of λ values would result in non-optimal or poor classification performance for each method, a subset of five best cases for each classification method was defined for a fair comparison, with the term "case" referring to either a wavelength, wavelength pair, or c.
For the experiment of assessing reproducibility, we first averaged the f values among the 10 frames for each wavelength, wavelength pair, or c. Then, the first 5 cases with the highest average f were selected, and the distribution was obtained from the accuracy values of each of these cases × 10 frames per case. For the experiment of characterizing the effects of unequal fluence emitted from each fiber tip, we first averaged the f values among the 10 frames and among the datasets for each wavelength, wavelength pair, or c. Then, the first 5 cases with the highest average f were selected, and the distribution was obtained from the accuracy values of each of these cases × 10 frames × 3 datasets per case. Finally, a pair-wise t-test was used to evaluate the statistical significance (p < 0.001) of the difference between MB and Hb spectra obtained from either the single-wavelength atlas or dualwavelength atlas method. This statistical analysis used 56,800 spectral samples of MB and 43,060 spectral samples of Hb for the wavelength pair of 710-870 nm (i.e., one of the wavelength pairs that yielded a classification accuracy of 1.00).

Image and Segmentation Examples
The left column of Figure 4 shows example LW-SLSC photoacoustic images co-registered to a DAS ultrasound image obtained with the laser wavelength indicated on each image. The right column of Figure 4 shows the corresponding segmentation mask, where the blue and red regions represent the ground truth labeled pixels for MB and Hb, respectively. Figure 4D shows example compound masks generated by the "OR" logical operation of a range of masks obtained from 690 to 800 nm wavelengths. Figure 4H shows example compound masks generated by the "OR" logical operation from a mask pair obtained with 690 and 920 nm wavelengths. Note that the FIGURE 4 | Locally-weighted short-lag spatial coherence (LW-SLSC) photoacoustic images overlaid on DAS ultrasound images of MB and Hb, obtained with a laser wavelength emission of (A) 710 nm (C), 800 nm (E) 890 nm, and (G) 920 nm. Segmented masks for MB and Hb after a -10 dB threshold was applied to the LW-SLSC photoacoustic images with single-wavelength masks shown for wavelengths of (B) 710 nm and (F) 890 nm, (D) the compound mask from "OR" logical operation on masks generated from 690 to 800 nm, and (H) the resulting mask from the "OR" logical operation of the 690 and 920 nm masks.
Frontiers in Photonics | www.frontiersin.org October 2021 | Volume 2 | Article 716656 varying areas of the LW-SLSC signals and corresponding mask sizes for the MB and Hb regions obtained with different laser wavelength emissions are responsible for different proportions of MB-to-Hb kernel sizes when calculating the quantitative metrics. Figure 5 shows segmentation examples of the best results among the three datasets for each of the classification approaches after estimating the corresponding optimal parameter, using the optimization expression f defined by Eq. 7. The blue and red regions represent correctly classified pixels of MB and Hb, respectively, while the yellow regions represent misclassified pixels. As observed previously in Figure 4, the changes in region size among the approaches are caused by the different LW-SLSC coherence masks that were computed for either single, pairs, or groups of wavelengths, depending on the requirements of each classification method. When qualitatively comparing the classified regions, the dualwavelength atlas method showed the best classification performance, as the majority of each MB and Hb region were labelled correctly (i.e., no yellow regions are shown). Similar performance was observed for the spectral unmixing and spectral unmixing + NNMF, which were generated with an additive coefficient c of 19.31 cm −1 and 9.10 cm −1 , respectively. Conversely, F-mode and k-means clustering could not properly detect signals from MB and Hb, respectively, showing a sensitivity and specificity of 0.76 and 0.79, respectively. A summary of the sensitivity, specificity, and accuracy obtained with the optimal parameter for each method is shown in Table 2.
3.2 Comparison of Sensitivity, Specificity, and Accuracy Figure 6 shows the combined results of sensitivity, specificity, and accuracy obtained from Datasets 1-3, using single-wavelength atlas, dual-wavelength atlas, spectral unmixing, F-mode, and k-means clustering methods. Spectral unmixing and spectral unmixing + NNMF were computed with c 6.23 cm −1 and c 5.18 cm −1 , respectively. Defining an accuracy ≥0.80 as good classification, spectral unmixing and spectral unmixing + NNMF showed a good mean accuracy of 0.85 and 0.87, respectively, among the three datasets. Similarly, the dualwavelength atlas method showed mean accuracy values greater than 0.85 for wavelength pairs of 690 and 810 nm through 840 and 870 nm, as shown in the green-colored middle region of the triangles showed in Figure 6. In contrast, the maximum values of mean accuracy for the singlewavelength atlas method, F-mode imaging, and k-means clustering among the three datasets were 0.77, 0.74, and 0.72 for wavelengths of 890 nm, 750 nm, and 880 nm, respectively. These mean accuracy values were lower than the minimum accuracy required for good classification performance, suggesting that only the dualwavelength atlas method and the spectral unmixing methods showed overall consistent classification performance.
As it is equally important to identify both MB and Hb regions, a high sensitivity and a low specificity pair, or vice-versa, corresponds in practice to a poor classification performance. Therefore, the totality of graphs shown in Figure 6 must be analyzed from this perspective. For some accuracy regions that are shown in blue (i.e., adequate classification), the same regions are colored in red (i.e., poor classification) for either the corresponding specificity or sensitivity or colored in green (i.e., good classification). For example, the single-wavelength atlas method showed a mean ± one standard deviation accuracy of 0.70 ± 0.06 over the wavelength range of 710-740 nm, while showing a sensitivity and specificity of 0.78 ± 0.08 and 0.40 ± 0.14, respectively, over the same wavelength range. Similarly, the k-means clustering method showed a mean ± one standard deviation sensitivity of 0.89 ± 0.11 over the wavelength range of 860-950 nm, while showing FIGURE 5 | Example of segmented regions of MB and Hb using different classification approaches. The blue and red regions represent correctly classified pixels of MB and Hb, respectively, while the yellow regions represent misclassified pixels. Each image shows the frame of the dataset generated with the wavelength emission that achieved the highest accuracy. a specificity and accuracy of 0.63 ± 0.05 and 0.67 ± 0.04, respectively, over the same wavelength range. These results further support the importance of evaluating sensitivity, specificity, and accuracy simultaneously, as described by Eq. 7, to determine the overall classification performance of each method. The mean ± one standard deviation for this combination of accuracy, sensitivity, and specificity (i.e., f in Eq. 7), over the same wavelength ranges described above for the single-wavelength atlas and the k-means clustering methods were 0.65 ± 0.06 and 0.72 ± 0.04, respectively. Table 3 and Table 4 present summaries of quantitative metrics from the average among wavelengths and the average among the best 5 cases for each classification method, respectively. The quantitative results follow the same trend as those described for the qualitative results, where the dual-wavelength atlas method achieved the highest sensitivity, specificity, and accuracy among the other methods. Similarly, the method with the second highest average in the best-5-cases evaluation was the spectral unmixing + NNMF. Figure 7A shows a representative example of spectra from the dual-wavelength atlas method and the equivalent stacked spectra from the single-wavelength atlas method. This display method was chosen to demonstrate the improvement in classification performance when using the dual-wavelength atlas method. For these examples, the testing spectra of 710 and 870 nm wavelengths were used, as this wavelength pair was one of eighty nine to achieve a specificity and sensitivity of 1.00 when using the dual-wavelength atlas method. The mean and standard deviation of the spectra were calculated from the spectra of all segmented pixels of MB and Hb data acquired with 710 nm or 870 nm wavelength for each atlas method. Error bars show one FIGURE 6 | Overall classification results with dual-wavelength atlas, single-wavelength atlas, spectral unmixing methods, F-mode, and k-means clustering using Dataset 1, 2, and 3. Top, middle and bottom rows show the sensitivity, specificity and accuracy of classification, respectively. The left and right columns show the mean and standard deviation over 10 frames, respectively. For each image, the first 2 vertical stripes counting from the left represents the results for spectral unmixing and spectral unmixing + NNMF, respectively, which have a single value from 690 to 800 nm wavelengths. The next three stripes represent the results for single-wavelength atlas, F-mode, and k-means clustering, respectively, as a function of wavelength emission. Finally, the triangle represents the results of the dual-wavelength atlas for each pair of wavelength combination.  standard deviation of the combined results from all kernels within selected ROIs, from 10 image acquisition frames, and from the three datasets. The overlapping spectra with the singlewavelength atlas method applied to either 710 nm or 870 nm wavelength acquisitions (top of Figure 7A) resulted in no statistically significant differences between the amplitude of the spectra for MB and Hb (p > 0.001), while Hb and MB differentiation was achieved with statistical significance (p < 0.001) when using the dual-wavelength atlas method (bottom of Figure 7A). This example illustrates that the enhanced differentiation achieved with the dual-wavelength atlas method can be attributed to the ability to differentiate the two spectra. Figure 7B shows a comparative evaluation of sensitivity and specificity between the atlas methods in an ROC-curve format. This display is included to support the observation that the dualwavelength atlas method achieves high classification performance from a range of wavelengths combinations, whereas the other methods achieve their highest classification performance from just a few cases, based on the results shown in Figure 6. For example, when a threshold region of 0.80 sensitivity and 0.80 specificity is defined as the criterion for adequate classification, Figure 7B demonstrates that 241 wavelength pairs and 0 single wavelengths met this criterion for the dual-and single-wavelength atlas methods, respectively. Therefore, the dual-wavelength atlas method provides a flexible range of light emission wavelength pairs such that the ideal pair can be chosen to differentiate between the same chromophores across multiple imaging environments. This flexibility is necessary when an unwanted chromophore produces a considerable photoacoustic response at the originally selected wavelength pair.  Comparison of sensitivity and 1-specificity from single-and dual-wavelengths atlas method using a one frame per wavelength and wavelength pair, respectively. The threshold regions delimits cases with both sensitivity and specificity greater than 0.8, which represents a good classification performance. coefficient c in Eq. 3. The dashed line represents the optimal c that achieved the highest average f(c) value among the three datasets. When calculating the optimal c 1 , c 2 , and c 3 for Datasets 1, 2, and 3, respectively, the absolute difference between the optimal c and each c 1 , c 2 , and c 3 in the conventional spectral unmixing method was 7.01 cm −1 , 21.87 cm −1 , and 2.70 cm −1 , respectively, resulting in a standard deviation of 12.35 cm −1 . In contrast, the difference between the optimal c and the individual c 1 , c 2 , and c 3 in spectral unmixing + NNMF resulted in a standard deviation of 1.37 cm −1 . This c difference suggests that the optimal c of spectral unmixing + NNMF is less sensitive to the testing data than that of conventional spectral unmixing. However, when evaluating the standard deviation of the classification accuracy at the optimal c, spectral unmixing + NNMF produced standard deviations of 6.47%, 0.54%, and 0.34% for Datasets 1, 2, and 3, respectively, while conventional spectral unmixing produced standard deviations of 1.93%, 0.47%, and 0.48% for Datasets 1, 2, and 3, respectively. Thus, the spectral unmixing techniques did not demonstrate accuracy robustness across different datasets, and this detail must be considered in tandem with the c differences. Figure 9A shows a summary of the accuracy results obtained with the spectral unmixing, single-and dual-wavelength atlas, F-mode, and k-means clustering methods using the full range of wavelengths and additive coefficient c. When comparing the distributions among Datasets 1-3, the maximum difference in median accuracy measured for each dataset when applying spectral unmixing, spectral unmixing + NNMF, F-mode, k-means clustering, and the single-wavelength atlas methods was 15.8%, 21.2%, 6.7%, 4.3%, and 25.5%, respectively, while the dual-wavelength atlas method showed a maximum difference in median accuracy of 1.4%. While these results display wide variations due to the inclusion of a wide range of wavelengths, wavelength pairs, or additive coefficient c, only specific values can be selected in advance and later used in clinical practice. Therefore, Figure 9B shows a subset of accuracy distributions obtained from the best 5 cases among the datasets, as defined in Section 2.6. The dual-wavelength atlas method showed a maximum difference in median accuracy between any two datasets of 0%, which was significantly lower than that obtained from spectral unmixing (7.8%), spectral unmixing + NNMF (9.5%), F-mode (15.6%), k-means clustering (10.1%), and the single-wavelength atlas method (3.3%). In addition, when evaluating the dual-wavelength atlas method on Datasets 1-3, the 710-870 nm wavelength pair was present among the five pairs of wavelengths with the highest accuracy, with sensitivity, specificity, and accuracy of 1.00, 1.00, and 1.00 respectively. Therefore, the dual-wavelength atlas method implemented with this wavelength pair shows higher reproducibility of classification performance than spectral unmixing and other acoustic-based methods. Figure 10 shows the classification accuracy of each dataset acquired with varying fluence pairs. The best 5 cases of wavelengths and additive coefficient c, as defined in Section 2.6. For each of the five existing methods, the maximum difference between the median accuracy values reported for any two fluence pairs (including the "Equal Fluence" Pair) was 3.3% for spectral unmixing, 3.8% for spectral unmixing + NNMF, 18.4% for F-mode, 12.9% for k-means clustering, and 11.9% for the single-wavelength atlas method. For the dual-wavelength atlas method, the maximum difference between the median accuracy values reported for any two fluence pairs was 0.1%, which is lower than the values reported for the five existing methods. These results demonstrate that the dual-wavelength atlas method is robust against changes in fluence levels when compared to acoustic-based methods that do not apply a normalization step such as F-mode. Figure 11 shows the classification accuracy from the dualwavelength atlas method tested on the datasets obtained with added tissue layers. The five wavelength pairs are sorted by the median accuracy obtained in the absence of a layer (i.e., in FIGURE 9 | Classification accuracy of each method when tested on Datasets 1-3, evaluated with (A) the full range of wavelengths and additive coefficient c (with each distribution obtained from 270-3,510 samples, i.e., 10 acquired frames × 27-351 wavelengths, wavelength pairs, or c) and (B) the best 5 cases of wavelengths and c among the datasets (with each distribution obtained from 50 samples, i.e., 10 frames × 5 cases). SU Spectral Unmixing, NNMF Non-negative Matrix Factorization.

Performance With Aberrating Media
FIGURE 10 | Classification accuracy of each method when tested with four fluence pairs, evaluated using the best 5 cases of wavelengths and additive coefficient c among the datasets. Each distribution was obtained from 150 samples (10 frames × 5 cases × 3 datasets). SU Spectral Unmixing, NNMF Non-negative Matrix Factorization.
Frontiers in Photonics | www.frontiersin.org October 2021 | Volume 2 | Article 716656 descending order from left to right). When comparing the results obtained in the absence of a tissue layer, the 2 mm tissue layer resulted in no significant change to the median accuracy for the 780-870 nm and 780-950 nm wavelength pairs and decreased median accuracies of 3.0%, 11.9%, and 8.0% for wavelength pairs of 690-950 nm, 690-870 nm, and 690-780 nm, respectively. When comparing the results obtained in the absence of a tissue layer with the 5 mm tissue layer results, the 780-870 nm and 780-950 nm wavelength pair showed a decrease in median accuracy of 3.5% and 4.9%, respectively, which was lower than that obtained with the remaining wavelength pairs, yielding an average median accuracy decrease of 14.4%. Finally, when comparing the results obtained in the absence of a tissue layer with the 7 mm tissue layer results, the 780-870 nm wavelength pair showed a decrease in median accuracy of 7.1%, which was significantly lower than that obtained with the other wavelength pairs, yielding an average median accuracy decrease of 22.5%. Results demonstrate that the aberrating conditions generally reduce performance due to the presence of the fatty tissue, with a median classification accuracy of 92.9% for a 7 mm-thick tissue layer when using a wavelength pair of 780-870 nm.

DISCUSSION
We demonstrated a novel method to accurately identify biological markers by analyzing the acoustic frequency response from either a single-wavelength emission (i.e., single-wavelength atlas method) or two consecutive wavelength emissions (i.e., dualwavelength atlas method). Overall, the best classification accuracy obtained with the dual-wavelength atlas approach outperforms that obtained from previous methods with similar goals, including spectral unmixing (Xia et al., 2016;Grasso et al., 2020), F-mode imaging (Moore et al., 2019), and k-means clustering (Cao et al., 2017). The dual-wavelength atlas method has three additional advantages over spectral unmixing techniques. First, the dualwavelength atlas method does not require a significant number of wavelengths, which is often an impediment to both real-time capabilities and surgical implementation. With only two wavelength emissions, the overall acquisition time is significantly reduced as well as the memory bandwidth that is proportional to the number of acquired frames. Second, our method does not heavily depend on a hyperparameter to enhance its classification accuracy, contrary to both conventional and NNMF spectral unmixing techniques. In addition, as observed in Figure 8A, there was no single c value that ensured the highest classification accuracy for the three datasets using the conventional spectral unmixing technique, which suggests that testing on a fourth dataset would not necessarily have the optimal accuracy result when using the same c. Third, the dual-wavelength atlas method shows consistent classification performance against different datasets, as observed in Figure 8B. Consequently, finding the same optimal wavelength pair for both datasets further supports the benefit of using only two wavelengths (i.e., 710 and 870 nm) for the identification of biological markers (i.e., MB and Hb) in future cases.
Normalization plays a key role in the single-wavelength atlas, dual-wavelength atlas, and k-means clustering methods because it prevents the use of amplitude as a distinguishing feature for classification. This is particularly important when characterizing structures located at different distances from the light source. By normalizing the spectrum from a single wavelength, both singlewavelength atlas and k-means clustering algorithms rely purely on the acoustic frequency content for tissue differentiation, which is often challenged by the limited frequency content obtained with a limited-bandwidth ultrasound transducer. In contrast, the dual-wavelength atlas method normalizes a pair of spectra, removing the amplitude dependency between two different regions but at the same time preserving the relative amplitude difference of two different light emission responses from the same region. In clinical practice, we envision dual excitation wavelengths illuminating the region of interest with a fastswitching laser source that quickly alternates between wavelengths, providing real-time labeling of photoacousticsensitive regions with comparable performance to that achieved with more conventional spectral unmixing techniques. The proposed method could be beneficial for a range of emerging photoacoustic imaging approaches oriented to surgeries and interventions Wiacek and Lediju Bell, 2021), such as hysterectomies (Allard et al., 2018a;Allard et al., 2018b;Wiacek et al., 2019), neurosurgeries (Bell et al., 2014;Graham et al., 2019;Graham et al., 2020b;Graham et al., 2021), spinal fusion surgeries Gonzalez et al., 2019;, as well as identification and distinction of metallic tool tips (Eddins and Bell, 2017;Allard et al., 2018b), cardiac catheter tips (Graham et al., 2020a), and needle tips (Lediju  from other surrounding structures of interest. Although the average accuracy values from the dualwavelength atlas method shown in Table 3 are low (i.e., 0.88) in comparison to the best case of accuracy (i.e., 1.00), this occurs because we included the classification results from the 351 wavelength pairs combinations used in this study in the reported average. As observed in Figure 6, several wavelength pairs yielded classification accuracy values < 0.65, which lowers the overall accuracy. A similar decrease was observed for the FIGURE 11 | Classification accuracy of the dual-wavelength atlas method when varying the thickness of the aberrating tissue layers, evaluated with five different wavelength pairs. Each distribution was obtained from 10 samples (10 frames × one wavelength pair).
Frontiers in Photonics | www.frontiersin.org October 2021 | Volume 2 | Article 716656 single-wavelength atlas, F-mode, and k-means clustering methods, as each method similarly used a range of wavelengths with most wavelengths producing poor classification results. However, in practice, only a reduced set of wavelength pairs would be used for the differentiation of MB and Hb with the dual-wavelength atlas method. Therefore, Table 4 represents a more realistic average from a reduced set of wavelength pairs for the dual-wavelength atlas method, and a reduced set of single wavelengths and hyper parameters for the other classification methods. One limitation of the atlas approach is the availability of the sample material for generating a significant number of spectra in the training set. While most of the targeted materials for classification are either biological fluids or contrast agents, a non-fluid biological landmark (e.g., bone) would require a new setup to couple the tissue to the background PVCP without using additional photoacousticsensitive materials. In addition, the optical and acoustic properties of the biological materials contained in the PVCP chambers must be similar to the expected in vivo properties in order to maximize success. With consideration that the acoustic response from in vivo and ex vivo Hb may differ due to expected degradation of ex vivo blood, the datasets described throughout this manuscript represented Hb in different states of degradation, and the dualwavelength atlas method showed no significant change in the classification accuracy under these degradation conditions. We additionally demonstrated the robustness of the dual-wavelength atlas method against distortions of the acoustic response due to aberrating media. Future work will explore the impact of additional confounding factors when characterizing biological tissues (e.g., decrease in blood oxygenation , deterioration of the lipid-rich myelin sheath of nerves (Shen et al., 2010)) on the classification performance of the dual-wavelength atlas method.
Additional future work includes improvements to the dualwavelength atlas method for real-time identification of tissues, through the utilization of common parallelization and optimization strategies. The currently sequential comparison of measurements with atlas spectra can potentially be addressed with at least one of three possible strategies. First, graphical processing units (GPUs) may be employed for concurrent comparisons of a test spectrum with several training spectra. The feasibility of GPU-based NN classification has been widely studied and demonstrated in the literature (Garcia et al., 2010;Florimbi et al., 2018;Zhu et al., 2014;Aydin, 2014;Gil-García et al., 2007). Second, atlas factors that affect the computation time may be carefully adjusted (e.g., increase the coherence threshold of the LW-SLSC mask or reduce the number of acquired frames per wavelength emission) without compromising classification performance. Increasing the coherence threshold would produce smaller masks and thus, less pixels to evaluate for classification. Once a new coherence threshold is defined, the number of frames needed for classification can then be empirically determined by maximizing the combined sensitivity, specificity, and accuracy using Eq. 7. Third, rather than evaluating the photoacoustic spectra with an extensive look-up table (i.e., spectral atlas), the classification time can be reduced by implementing an artificial neural network (ANN) to learn and match features of the acoustic spectra (i.e., bypassing PCA feature extraction), which has been successfully implemented in previous studies (Jain and Mao, 1991;Wang et al., 2018).
The proposed framework could be extended to multinomial classification (i.e., more than 2 tissues to classify), which is often necessary during in vivo interventions as photoacoustic signals originating from the surrounding tissue cannot be neglected. In contrast to binomial classification, multinomial classification would benefit from more robust feature extraction methods such as linear discriminant analysis (Belhumeur et al., 1997;Kwak and Pedrycz, 2005) or more robust classifiers such as sparse representation-based classifiers (Wright et al., 2008;Yang et al., 2013). While the increase in the algorithm complexity can be addressed with the strategies discussed in the preceding paragraph, an alternative solution is to include a clustering criteria within the NN classification. Specifically, for trinomial classification where one of the regions would be labelled as a third component that does not exist in the atlas (e.g., background noise). Then, the acoustic spectra from this region can be identified when the error of the closest match surpasses a specific threshold, indicating the presence of a third component. Alternatively, NN-k or fuzzy C-means classification (Chuang et al., 2006) may be employed to characterize the degree of belonging in regions where two materials are combined, with the potential benefit of assessing the percentage of a specific material within a region of interest.

CONCLUSION
We developed a novel acoustic-based photoacoustic classifier that relies on training sets to identify photoacoustic-sensitive materials. The proposed method is robust against changes in fluence levels and showed comparable sensitivity, specificity, and accuracy performance to those obtained with conventional and enhanced spectral unmixing methods. In clinical practice, we envision dual excitation wavelengths illuminating the region of interest with a fast-tuning laser source, providing real-time labeling of photoacoustic-sensitive regions with a GPU-based parallelized algorithm version or deep neural network architectures. Results from the presented experiments are promising for the identification of biological or bio-compatible markers (e.g., blood and contrast agents) during surgical interventions. By using the normalized photoacoustic response from two wavelength iterations, surgeons can localize structures of interest and surgical tools while avoiding other structures that are in close proximity to the targeted operating region.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because they may be made available for non-commercial reasons upon reasonable request to the senior author, after acceptable completion of the university's data use agreement. Requests to access the datasets should be directed to mledijubell@jhu.edu.

AUTHOR CONTRIBUTIONS
EG developed the atlas-based methods, designed the experiment, acquired the data, analyzed the results, and drafted the article. CG Frontiers in Photonics | www.frontiersin.org October 2021 | Volume 2 | Article 716656 performed additional investigations and experiments to optimize the dual-wavelength atlas method. MALB conceptualized the idea and edited article drafts. All authors interpreted the experimental results and approved the submitted version of the article.