Automated Classification of Breast Cancer Cells Using High-Throughput Holographic Cytometry

Holographic cytometry is an ultra-high throughput quantitative phase imaging modality that is capable of extracting subcellular information from millions of cells flowing through parallel microfluidic channels. In this study, we present our findings on the application of holographic cytometry to distinguishing carcinogen-exposed cells from normal cells and cancer cells. This has potential application for environmental monitoring and cancer detection by analysis of cytology samples acquired via brushing or fine needle aspiration. By leveraging the vast amount of cell imaging data, we are able to build single-cell-analysis-based biophysical phenotype profiles on the examined cell lines. Multiple physical characteristics of these cells show observable distinct traits between the three cell types. Logistic regression analysis provides insight on which traits are more useful for classification. Additionally, we demonstrate that deep learning is a powerful tool that can potentially identify phenotypic differences from reconstructed single-cell images. The high classification accuracy levels show the platform’s potential in being developed into a diagnostic tool for abnormal cell screening.


INTRODUCTION
Breast cancer is one of the most diagnosed cancers worldwide and its metastasis is the leading cause of death [1,2]. Fine-needle aspiration cytology (FNAC) is a widely used method for breast cancer diagnosis and preoperative assessments. Subsequent to sample collection in FNAC, histopathological examination is the gold standard procedure performed to define histological grade [3]. However, this approach can be time-consuming, costly, labor-intensive and prone to low sensitivity [4]. Previous works show that FNAC coupled with flow cytometry (FC) is a simpler, more rapid process that provides comparable performance to conventional cytologic diagnosis [5][6][7]. These works typically involve the use of immunocapture cytometry systems which have very limited sensitivity to certain abnormal cells due to their incapability to detect cells not expressing the corresponding cell-surface epithelial cell marker [8,9]. Additionally, current flow cytometers are oriented to detecting metastatic cells and little work has been done for identifying prognostic biomarkers of early cancer or precancerous cells. As an alternative, quantitative phase imaging (QPI) coupled with microfluidics offers a highly sensitive, high throughput label-free modality which, through phenotypical profiling, has the potential to be used to detect both malignant and potentially unhealthy cells. QPI-based profiling has already demonstrated high-throughput capabilities and high accuracy levels in identifying cell-cycle phases, in ultrafast all-optical laser scanning approaches [10,11]. In other QPI approaches, deep learning has proven to be a powerful tool used by many for QPI classification [10][11][12][13], inference [14][15][16][17], and reconstruction [18][19][20]. We now seek to use ultra-high throughput QPI to build phenotypic profiles of single cells subjected to carcinogens, which can be, in combination with deep learning, used to develop prognostic biomarkers of precancerous cells.
For this work, we examine exposure to heavy metal ions, which can be a contributor to carcinogenic development [21][22][23]. Our previous work used a flow assay to study cell stiffness changes in arsenic-treated cells and the relationship with normal cell transformation into carcinogenic state [24]. This earlier study has led to our new interests in Cadmium (Cd), which is a commonly found pollutant that is present in food, water, and our surrounding environment [25]. According to the International Agency for Research on Cancer (IARC), Cd is classified as a group 1 carcinogen to humans [26]. Studies have shown that Cd exposure promotes cancer progression in epithelial cells and there is ample evidence of its role in inducing cancer [27][28][29]. Thus, Cd-treated cells are a suitable target for assessing aberrant morphologies with QPI and can enable comparisons with cancer cells. Through evaluating the precancerous phenotypes in Cd-treated cells, our results may provide valuable insights to a normal healthy cells' conversion to cancer.
In this manuscript, a study is presented with applies an ultrahigh throughput QPI approach, termed holographic cytometry (HC), to develop phenotype profiles of breast cells in different states of cancer progression. The profiles can be then used to develop classifiers to discriminate cells in unknown states. HC is based on a stroboscopic QPI approach, advanced to provide a high throughput approach by enabling imaging in multiple parallel microfluidic channels [30]. The HC system is applied here to acquire images of a breast cancer cell line BT474, a normal breast epithelial cell line MCF10A, and Cd-treated MCF10A cells which are expected to show pre-cancerous changes. Phenotypical differences, determined from analysis of the acquired images, are used to characterize and differentiate each cell line. Convolutional neural network (CNN) and logistic regression algorithms are used to provide discrimination between different cell types that can form the basis for identifying early-stage cancer.

Cell Culture Protocol
Three different cell lines are imaged in this experiment: MCF10A, Cd-treated MCF10A and BT474. MCF10A and Cd-treated MCF10A were cultured in Dulbecco's Modified Eagle Medium/F12 (Invitrogen #11330-032) supplemented with 5% horse serum, epidermal growth factor, hydrocortisone, cholera toxin and insulin. For the latter, MCF10A was treated with 2.5 μM Cd for 24 weeks prior to imaging. Both cell lines were passaged every 3 days using 0.25% trypsin. BT474 were cultured in Minimum Essential Medium Eagle, supplemented with insulin and 10% bovine serum. BT474 was passaged every 3 days using 0.05% trypsin. Cells were incubated in 37°C and 5% CO2 environment.

System Overview
The HC system is based on the Mach-Zehnder off-axis interferometer design, as shown in Figure 1. Our system consists of a 640 nm continuous wave laser pulsed by an acousto-optic modulator (AOM) at 300 Hz, with a pulse width of 350 μs The AOM is synchronized with the camera frame rate, through an Arduino based microcontroller [31]. The pulsed beam is then coupled to an optical fiber and divided using a fiber splitter into the off-axis Mach-Zehnder interferometer's reference and sample arms. The collimated beam in the sample arm passes through samples contained in a microfluidic element. The beam exiting the sample is magnified using a ×20 objective (NA 0.4) and interfered with light from the reference arm beam at the beam splitter (BS). A small angle is imparted on the reference field to enable off-axis digital holography [32]. The magnified interferogram is then captured by a wide-framed high-speed camera (Dalsa, 4,096 × 96px, 300 fps). System throughput is optimized via wide field of view implementation and stroboscopic illumination, which allows for rapid acquisitions without image streaking. In each acquisition, 10,000 images are acquired within 33 s, enabling the system to image up to 148 cells per second. The overall optical magnification of the system is converted to × 30 by selecting the focal length of lens (L3). This gives us a field of view that covers 16 channels. The resolution of the HC is 1μm, which is close to diffraction limit. The system is highly stable, producing a phase sensitivity of 15mrad, which corresponds to 1.1 nm.

Microfluidic Channel Fabrication
The microfluidic element is comprised of 108 or 54 parallel channels implemented in polydimethylsiloxane (PDMS) substrate using soft lithography methods [32]. Since the diameter of the largest cell line-BT474 is approximately 20 μm, we designed each channel to have a height and width of 30 μm. For fabrication of PDMS channels, the polymer is prepared by mixing base polymer with curing agent at a 10:1 ratio. The mixture is poured onto a patterned photomask and heated in a convection oven at 85°C for 2 hours to form the PDMS channels. After curing glass is bonded to the channel side of PDMS, inlet and outlet channels are added to enable cell media flow entry and exit.

Imaging Experiment
Cells are supplied through the inlet channel of the PDMS element using an automated syringe, with an initial flow speed of 15 μL/ min. The actual speed of cells in a given channel depends on the fluid pressure and the cell concentration. We adjust the flow speed to match camera acquisition rate, so that a sufficient sequence of more than three images of each cell entering and exiting field of view can be captured for cell-tracking. MCF10A cells are imaged in continuous PDMS channels (Figure 2A) while Cd-treated MCF10A cells and BT474 cells are imaged in branched PDMS channels ( Figure 2B). Selection of channel type for these cells is based on the cell concentration at time of experiment, evaluated prior to each experiment with a hemocytometer.
Samples with concentration below 700,000 cells/mL are flowed in branched channels while samples with concentrations above 700,000 cells/mL are flowed in continuous channels. This selection criterion is the result of trial and error that determined the optimal choice to avoid channel blockage. All three cell lines are flowed through the PDMS elements in their respective cell media.

Postprocessing
For each frame, the off-axis interferogram is processed using standard methods to extract a phase image of the optical path delay through each cell in the sample. Briefly, to obtain single cell images, each frame of the interferogram goes through phase FIGURE 1 | System overview [31]. L, lens; BS, beam splitter; AOM, acousto-optic modulator. The first order ray diffracted from AOM splits into sample arm and reference arm. Prior to imaging, both arms are path matched. Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 759142 3 reconstruction, background subtraction, background fitting and digital refocusing (Figure 3) [31]. To ensure that cell data describes individual cells and cell clumps are avoided, a watershed-based segmentation algorithm is implemented (Figure 4). The segmentation masks are dilated using Matlab's default dilate function. With the segmented frames, a modified tracking code was developed based on the Matlab Computer Vision Toolbox to identify duplicate cells that are present across multiple frames [31]. Digital holograms allow for digital refocusing during postprocessing. In this sample, large refocusing ranges (up to 10's of microns) are possible due to the tall channel height, so each cell within the same frame is refocused separately to different propagation distances ( Figure 5). Every segmented whole cell identified by the tracking code, including duplicates, is refocused through Fresnel propagation to the plane with minimal amplitude variance to obtain the plane of best focus [31]. 25 morphological parameters (see supplementary) are calculated for each single cell image and passed through logistic regression [31]. The raw single cell images are used to train a convolutional neural network (CNN).

Classification Algorithms
The acquired data are classified using two different algorithms-logistic regression and CNN. In total, 5,662 × 25 descriptors (cells x morphological parameters) are calculated for    Figure 6. The logistic regression model is trained for 10 epochs.
In contrast to logistic regression, CNN classification accepts raw phase images as input and outputs classification results. Based on ResNet, the deep neural network consists of three res blocks ( Figure 7B) and a final convolution layer. Conceptually, each res block serves as a residual minimizer ( Figure 7A).

RESULTS
In the span of 33 s . We observe that among the 25 morphological parameters extracted, optical volume (OV), area, major axis length and mean phase show the clearest trends. As shown in Figure 8(A-H), for mean area, mean OV and mean major axis length across the three cell lines, the ascending order of the parameters is MCF10A, Cd-treated MCF10A and BT474. While the BT474 cell line has more variation in OV values, the Cdtreated MCF10A and MCF10A cell lines show more homogeneity.
The dataset for each cell line is split into 2,700 training and 131 testing subsets. As shown in Table 1, when all 25 parameters are used for training, logistic regression yields the highest binary classification mean accuracy. If only the four morphological parameters shown above (Area, OV, Major Axis Length, Mean Phase) are used as input for logistic regression, the binary classification still yields adequate accuracy levels, as shown in Table 2. Overall, logistic regression demonstrates highly accurate classification performance for discriminating normal epithelial cells from abnormal cells. The CNN exhibits high accuracy performance in identifying MCF10A, Cd-treated MCF10A and BT474 through evaluating the single cell phase image, shown in Table 3 and Figure 9(A, B).

DISCUSSION AND CONCLUSION
The combination of FNAC with FC provides a cheaper, quicker and reliable alternative to the standard histological evaluation for finding cancer cells [33,34]. However, some of the disadvantages of current FC methods include the inaccessibility to single-cell morphological information [35] and fluorescent assays of biochemical markers may be misleading if not exclusively expressed in cancer cells [35][36][37]. Sample preparation usually    involves tedious staining processes and any mishandling can cause low test sensitivity [38]. In comparison, the holographic cytometer is a label-free imaging modality that provides a morphology-based phenotype profile for each cell line, without the need for fluorescent markers and very little sample preparation. High throughput, enabled by high-speed camera acquisition synchronized with pulsed illumination and fast cell flow, provides an abundance of single cell imaging data that is suitable for CNN classifications. In Table 3, we demonstrate with our CNN model performance that deep learning has a 98-99% in distinguishing between normal (MCF10A) and abnormal (Cdtreated MCF10A and BT474) cells. Figure 8(A-H) show that the Cd-treated cell line exhibit intermediate morphological changes that are in between normal breast epithelial cells and breast cancer cells. For certain parameters such as mean phase, the Cd-treated cell line overlaps more with the breast cancer cells than normal cells. Through evaluating the similarities and differences of morphological parameters across the cell lines, we can potentially use these phenotype profiles to develop biomarkers which can be used to identify carcinogen-exposed cells.
We show in our analysis results that the modality indeed possesses great potential in serving as an abnormal cell identifier. Trained with large numbers of abnormal and normal healthy cells, the logistic regression algorithm and CNN are tuned to be highly sensitive and specific to abnormal cells. Among the three binary classification pairs, MCF10A/BT474 classification shows the best performance, with a mean accuracy of 99.3% and the highest sensitivity and specificity. The MCF10A/BT474 classification pair and MCF10A/MCF10A Cd pair have comparable accuracy levels (98-99%) in logistic regression and CNN, shown in Table 1 and Table 3. Note that because the Cdtreated MCF10A exhibit a greater overlap in morphological features with BT474 (Figure 8), it is within expectations that our models exhibit lower performance in classifying MCF10A Cd/BT474 pair (∼90%, Table 1). Using these two classification methods illustrates how each morphological difference contributes differently to the classification accuracy. CNN has proven to be a powerful tool in classifying the three cell lines while logistic regression provides insight on which traits play more prominent roles in classification. In general, the system is most accurate in differentiating between the abnormal cells and healthy cells.
We note that majority of the phenotypic variation can be explained by just the four statistically significant different parameters, cell area, OV, major axis length and mean phase. The logistic regression model achieves adequate accuracy levels (89-95%, Table 2) in identifying normal/abnormal cells, with only these four parameters as input. Thus, in practice, the system can operate fairly effectively by monitoring fewer morphological parameters to produce a shorter processing time. Alternatively, inclusion of more parameters offers additional diagnostic capability in classifying larger, more diverse cell populations with greater variability.
In previous work, we used a flow assay to investigate Arsenic-treated epithelial cells with QPI imaging. This study found that the tumorigenic cells show a reduced shear stiffness compared to non-tumorigenic cells [24]. Indeed, reduced stiffness is an essential feature for invasion through promotion of epithelial to mesenchymal transition. Here, we find that Cd treatment induces morphological changes in normal cells that resemble cancer cells. Combined, these results point to a potential means for identifying early signs that cells have malignant potential. Identification of such cells could have a significant impact on cancer detection and selection of therapeutic action. Future investigations will seek to identify the Cd threshold required for onset for phenotypic differences and to produce tumorigenic cell lines. We are also interested in investigating the effects of Cd treatment on disorder strength, which has been shown to be a surrogate biomarker of cell stiffness [39].
In conclusion, the HC system shows promising performance in enabling high throughput analysis of cell samples. Here the analysis demonstrates the feasibility of producing effective biomarkers for cancer cells and carcinogen-exposed cells. The results suggest these biomarkers offer the possibility that the system can be developed into a diagnostic tool for early-stage cancer by analysing large numbers of individual cell images. Incorporating additional features such as fluorescence and cell stiffness measurements in the future will further broaden utility.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the author, without undue reservation.

AUTHOR CONTRIBUTIONS
CC, assisted by HP, performed all experiments and data analysis and wrote the manuscript. HP. developed cell-tracking algorithm. AW. supervised the entire project and critical revision of the manuscript.

FUNDING
This work was supported by 1R21ES029791 from NIH grant.