AUTHOR=Kalweit Gabriel , Klett Anusha , Silvestrini Paula , Rahnfeld Jens , Naouar Mehdi , Vogt Yannick , Infante Diana , Berger Rebecca , Duque-Afonso Jesús , Hartmann Tanja Nicole , Follo Marie , Bodurova-Spassova Elitsa , Lübbert Michael , Mertelsmann Roland , Boedecker Joschka , Ullrich Evelyn , Kalweit Maria 

TITLE=Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices

JOURNAL=Frontiers in Oncology

VOLUME=Volume 15 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1480384

DOI=10.3389/fonc.2025.1480384

ISSN=2234-943X

ABSTRACT=BackgroundCellular imaging analysis using the traditional retrospective approach is extremely time-consuming and labor-intensive. Although AI-based solutions are available, these approaches rely heavily on supervised learning techniques that require high quality, large labeled datasets from the same microscope to be reliable. In addition, primary patient samples are often heterogeneous cell populations and need to be stained to distinguish the cellular subsets. The resulting imaging data is analyzed and labeled manually by experts. Therefore, a method to distinguish cell populations across imaging devices without the need for staining and extensive manual labeling would help immensely to gain real-time insights into cell population dynamics. This especially holds true for recognizing specific cell types and states in response to treatments.ObjectiveWe aim to develop an unsupervised approach using general vision foundation models trained on diverse and extensive imaging datasets to extract rich visual features for cell-analysis across devices, including both stained and unstained live cells. Our method, Entropy-guided Weighted Combinational FAISS (EWC-FAISS), uses these models purely in an inference-only mode without task-specific retraining on the cellular data. Combining the generated embeddings in an efficient and adaptive k-nearest neighbor search allows for automated, cross device identification of cell types and states, providing a strong basis for AI-assisted cancer therapy.MethodsWe utilized two publicly available datasets. The WBC dataset includes 14,424 images of stained white blood cell samples from patients with acute myeloid and lymphoid leukemia, as well as those without leukemic pathology. The LISC dataset comprises 257 images of white blood cell samples from healthy individuals. We generated four in-house datasets utilizing the JIMT-1 breast cancer cell line, as well as Jurkat and K562 (leukemic cell lines). These datasets were acquired using the Nanolive 3D Cell Explorer-fluo (CX-A) holotomographic microscope and the BioTek Lionheart FX automated brightfield microscope. The images from the in-house datasets were manually annotated using Roboflow software. To generate the embeddings, we used and optimized a concatenated combination of SAM, DINO, ConvNeXT, SWIN, CLIP and ViTMAE. The combined embeddings were used as input for the adaptive k-nearest neighbor search, building an approximate Hierarchical Navigable Small World FAISS index. We compared EWC-FAISS to fully fined-tuned ViT-Classifiers with DINO-, and SWIN-backbones, a ConvNeXT architecture, as well as to NMTune as a lightweight domain-adaptation method with frozen backbone.ResultsEWC-FAISS performed competitively with the baselines on the original datasets in terms of macro accuracy. Macro accuracy is the average of class-specific accuracies, treating all classes equally by averaging their individual accuracies. EWC-FAISS ranked second for the WBC dataset (macro accuracy: 97.6 ± 0.2), first for cell state classification from Nanolive (macro accuracy: 90 ± 0), and performed comparably for cell type classification from Lionheart (macro accuracy: 87 ± 0). For the transfer to out-of-distribution (OOD) datasets, which the model had not seen during training, EWC-FAISS consistently outperformed the other baselines. For the LISC dataset, EWC-FAISS achieved a macro accuracy of 78.5 ± 0.3, compared to DINO FT’s 17 ± 1, SWIN FT’s 44 ± 14, ConvNeXT FT’s 45 ± 9, and NMTune’s 52 ± 10. For the cell state classification from Lionheart, EWC-FAISS had a macro accuracy of 86 ± 1, while DINO FT, SWIN FT, and ConvNeXT FT achieved 65 ± 11, 68 ± 16, and 81 ± 1, respectively, and NMTune 81 ± 7. For the transfer of cell type classification from Nanolive, EWC-FAISS attained a macro accuracy of 85 ± 0, compared to DINO FT’s 24.5 ± 0.9, SWIN FT’s 57 ± 6, ConvNeXT FT’s 54 ± 4, and NMTune’s 63 ± 4. Additionally, building EWC-FAISS after embedding generation was significantly faster than training DINO FT (∼ 6 minutes compared to > 10 hours). Lastly, EWC-FAISS performed comparably in distinguishing cancerous cell lines from Peripheral Blood Mononuclear Cells with a mean accuracy of 80 ± 5, compared to CellMixer with a mean accuracy of 79.7.ConclusionWe present a novel approach to identify various cell lines and primary cells based on their identity and state using images acquired across various imaging platforms which vary in resolution, magnification and image quality. Despite these differences, we could show that our efficient, adaptive k-nearest neighbor search pipeline can be applied on a large image dataset containing different cell types and effectively differentiate between the cells and their states such as live, apoptotic or necrotic. There are several applications, particularly in distinguishing various cell populations in patient samples or monitoring therapy.