Deep learning-based image classification for microstructural analysis in computational materials science

Liu, Haiyan; Zhu, Penghua; Tan, Chenyu

doi:10.3389/fmats.2025.1648653

ORIGINAL RESEARCH article

Front. Mater., 20 January 2026

Sec. Computational Materials Science

Volume 12 - 2025 | https://doi.org/10.3389/fmats.2025.1648653

This article is part of the Research TopicAdvancing Computational Material Science and Mechanics through Integrated Deep Learning ModelsView all 8 articles

Deep learning-based image classification for microstructural analysis in computational materials science

Haiyan Liu¹

Penghua Zhu²*

Chenyu Tan³

¹North China Institute of Aerospace Engineering, Langfang, Hebei, China
²Cangzhou Normal University, Cangzhou, Hebei, China
³Changchun University of Technology, Changchun, Jilin, China

Introduction: Recently, the integration of deep learning techniques and computational materials science has catalyzed significant advances in the microstructural analysis of materials, particularly through the lens of multiscale, high-dimensional imaging data. However, conventional models often fall short in capturing the intricate topology and spatial variability that define realistic microstructural patterns, limiting their ability to inform material property predictions, inverse design, and structural synthesis.

Methods: To overcome these challenges, we introduce an innovative deep learning framework designed for microstructural image classification and representation learning, incorporating physical, geometric, and topological constraints directly into the training process. Our method, centered on the structured generative model MorphoTensor, introduces hierarchical tensorial embeddings that retain directionality, anisotropy, and spatial locality—features crucial for realistic material modeling. We further incorporate a Topology-Aware Latent Refinement strategy, which couples persistent homology with differentiable approximations of Betti numbers to enforce topological consistency and augment microstructural diversity. Unlike existing data-driven pipelines, our framework seamlessly integrates statistical encoding, topologicalization, and latent manifold alignment within a unified architecture, ensuring robustness across diverse datasets including phase-field simulations and real microscopy data.

Results and Discussion: Empirical evaluations on benchmark and experimental datasets demonstrate that our method significantly outperforms standard convolutional and autoencoding baselines in accuracy, stability, and generalization. Moreover, our approach aligns closely with the ongoing efforts in the broader computational materials and mechanics communities to build interpretable, physically informed, and adaptable deep learning systems. These contributions illustrate the potential of structured deep generative modeling as a foundational tool for advancing intelligent microstructure analysis and design in materials informatics.

1 Introduction

The rapid advancement of computational materials science has made it possible to simulate and analyze with unprecedented accuracy the microstructural features of materials (Chen C.-F. et al., 2021). However, accurately classifying these microstructures remains a challenging task due to the intricate patterns, varying scales, and diverse morphologies present in materials data (Hong et al., 2021). Traditional image analysis techniques struggle to generalize across different material systems, leading to inconsistent performance (Maurício et al., 2023). Therefore, there is a growing necessity for more robust and adaptive methods to interpret microstructural images. Deep learning, especially convolutional neural networks (CNNs), has emerged as a powerful solution for such tasks, not only enhancing classification accuracy but also enabling the discovery of subtle structural patterns that are difficult to identify through manual or conventional computational methods (Touvron et al., 2021). The integration of deep learning into microstructural analysis holds promise for accelerating materials discovery, optimizing material properties, and improving predictive modeling capabilities (Wang et al., 2022).

Initial studies approached microstructural image interpretation through classical computer vision techniques that emphasized low-level descriptors and algorithmic rules. These methods typically relied on predefined image processing operations such as edge detection, texture analysis, and morphological transformations (Tian et al., 2020). The extracted features were then used to construct visual representations that could be manually classified or interpreted by domain experts. Such techniques offered clear interpretability and were relatively straightforward to implement, making them well-suited for early investigations into structured or periodic microstructures (Yang et al., 2021). However, their effectiveness was largely constrained to idealized or synthetic datasets, where visual patterns exhibited strong regularity and minimal noise. In real-world materials, microstructures often display high variability in scale, orientation, and contrast, compounded by imaging artifacts and inter-sample heterogeneity (Hong et al., 2020). Classical descriptors, being low level and often linear in nature, lacked the expressiveness to model these complexities. Consequently, their generalization ability across different material systems, imaging modalities, or sample preparation methods was limited. In response to these shortcomings, the field began transitioning toward more adaptive and data-responsive frameworks. Researchers introduced semi-automated pipelines that combined classical feature extraction with rule-based decision trees or clustering algorithms, aiming to reduce the burden of manual annotation while improving consistency (Sun et al., 2022). These hybrid methods offered improved flexibility and some resilience to noise and structural diversity, but they still depended heavily on expert knowledge to define relevant features and threshold values. As the demand for scalable and generalizable microstructural analysis grew—particularly in high-throughput materials discovery contexts—it became clear that more robust, data-driven modeling approaches were needed to cope with the growing complexity and volume of materials imaging data (Rao et al., 2021).

Building on this need for adaptability, researchers began integrating statistical modeling and pattern recognition techniques that allowed systems to learn from annotated examples rather than relying solely on fixed rule sets (Kim et al., 2022). This marked a methodological shift toward supervised learning paradigms, where algorithms were trained to associate input features with known output labels based on curated microstructural datasets. Methods like support vector machines (SVM), random forests, k-nearest neighbors (k-NN), and principal component analysis (PCA) have gained broad usage in tasks involving classification, clustering, and dimensionality reduction (Mai et al., 2021). These models were typically coupled with engineered feature extraction pipelines involving texture descriptors, histogram statistics, frequency domain transforms, and geometric quantifiers of microstructural morphology (Bostanabad et al., 2018). The resulting hybrid frameworks improved both prediction accuracy and computational efficiency relative to early rule-based approaches, particularly for moderately sized datasets where manual labeling was feasible. Their utility was demonstrated in tasks such as grain boundary classification, phase segmentation, and defect detection in polycrystalline or composite materials (Azizi et al., 2021). Despite these advances, the performance of such models remained tightly bound to the quality and representativeness of the input features. Because feature design was largely manual and guided by domain heuristics, important structural cues—especially those spanning multiple spatial scales or exhibiting irregular patterns—were often missed or poorly encoded (DeCost and Holm, 2015). Furthermore, these models lacked the hierarchical representation learning capacity necessary to capture complex dependencies and interactions within heterogeneous microstructures (Li et al., 2020). As a result, their generalization capability across diverse material systems, imaging resolutions, or sample preparation techniques was limited (Kalidindi and De Graef, 2015). These shortcomings highlighted the need for more expressive, automated, and data-adaptive frameworks that could learn robust feature representations directly from raw or minimally processed image data (Bhojanapalli et al., 2021).

The evolution toward fully trainable, end-to-end systems marked a significant turning point with the adoption of deep learning models (Zhang et al., 2020). Convolutional neural networks (CNNs) and their extensions enabled direct learning from raw image data, bypassing the need for handcrafted features (Roy et al., 2022). These architectures proved especially powerful in capturing hierarchical spatial patterns and adapting across different imaging settings (Zhu et al., 2020). The availability of larger datasets and computational resources further amplified their impact, allowing for deeper and more expressive models (Chen L. et al., 2021). The application of transfer learning techniques and domain-specific fine-tuning expanded their reach into data-limited areas of materials research. While challenges such as interpretability and generalization across modalities persist, the integration of deep learning with physics-guided modeling and generative frameworks continues to push the boundaries of automated microstructural analysis (Ashtiani et al., 2021).

Based on the limitations of feature dependence in ML and the data hunger and interpretability concerns of deep learning, we propose a hybrid method that leverages domain-specific inductive biases and lightweight attention-enhanced CNNs to balance performance, efficiency, and generalizability. Our approach integrates a shallow attention module that dynamically focuses on microstructural regions of interest, combined with transfer learning from a domain-specific pretraining phase, allowing the model to effectively classify diverse microstructures with fewer labeled examples. By embedding prior knowledge and enhancing feature saliency, our model addresses both the data efficiency and explainability challenges commonly faced in deep learning-based material image analysis. Furthermore, we evaluate the method across multiple datasets covering various material types and imaging resolutions, demonstrating its robustness and practical value for accelerating microstructural classification in computational materials science.

• We introduce a novel attention-augmented CNN architecture tailored for microstructural image classification, enabling dynamic focus on relevant texture features.

• The method exhibits high adaptability across different materials and imaging conditions, ensuring generalizability and efficiency in practical applications.

• Experimental results show that our model achieves superior classification accuracy with reduced training data requirements, outperforming conventional CNN baselines.

2 Related work

2.1 Convolutional networks for microstructures

The application of convolutional neural networks (CNNs) to microstructural image classification has emerged as a central theme in computational materials science. CNNs are well-suited to this domain due to their capability to capture hierarchical spatial features in image data, which is critical when analyzing complex textures and phase distributions inherent in materials microstructures (Masana et al., 2020). Early efforts focused on utilizing standard architectures such as AlexNet and VGGNet to distinguish between different grain morphologies, crystal orientations, and defect types. These models demonstrated strong performance on datasets of synthetic micrographs generated through phase-field simulations or molecular dynamics (Rezaei et al., 2025). Subsequent studies improved upon these methods by incorporating domain-specific augmentations and preprocessing techniques tailored to the nature of materials images. For, contrast normalization, orientation alignment, and noise filtering were often employed to standardize inputs and enhance feature salience (Sheykhmousa et al., 2020). Transfer learning from pretrained networks on natural image datasets such as HEDM has also been shown to significantly boost performance, particularly when labeled microstructural datasets are limited in size. Recent work has moved beyond mere classification to integrate CNNs with unsupervised learning and clustering to uncover latent structural patterns (Mascarenhas and Agarwal, 2021). Hybrid methods that combine CNN-based feature extractors with classical machine learning classifiers have proven effective in improving the generalizability of findings across diverse material systems. Such approaches underscore the adaptability of deep convolutional models in handling the heterogeneity and high dimensionality typical of microstructural data in materials informatics (Rezaei et al., 2024a).

2.2 Data augmentation and synthesis

Data scarcity remains a pressing challenge in the development of robust deep learning models for microstructural classification (Rezaei et al., 2024b). To address this, a variety of data augmentation and synthesis strategies have been employed. Basic augmentation techniques such as rotation, flipping, scaling, and elastic deformation are widely adopted to enhance model generalization and reduce overfitting. These transformations simulate the physical variability present in microstructural samples without altering their intrinsic material characteristics (Zhang et al., 2022). More sophisticated approaches leverage generative adversarial networks (GANs) to create realistic synthetic micrographs. GANs can learn the underlying distribution of microstructural images and generate high-fidelity examples that preserve critical statistical and textural properties. These synthetic datasets not only augment training corpora but also support model benchmarking under controlled conditions (Dai and Gao, 2021). Conditional GANs (cGANs) have further enabled the generation of class-specific microstructures, enhancing the diversity and utility of synthetic samples in supervised learning contexts. Another promising avenue involves the use of physics-informed simulations to generate labeled microstructural data. Phase-field modeling, Monte Carlo methods, and cellular automata simulations are commonly utilized to produce synthetic micrographs with known ground truths (Taori et al., 2020). These simulated datasets serve as a valuable source of training data, particularly for rare or experimentally inaccessible microstructural features. Integrating such data with real experimental micrographs through domain adaptation techniques can bridge the synthetic–real gap and improve model transferability to practical applications (Alotaibi et al., 2025).

2.3 Interpretable and physics-guided models

The integration of interpretability and physical priors into deep learning frameworks represents crucial research (Ru et al., 2025). Traditional CNNs, while powerful, often function as black boxes, providing little insight into the underlying material phenomena driving classification outcomes (Peng et al., 2022). To mitigate this, recent work has explored explainable AI (XAI) techniques to visualize salient features and activation maps. Methods such as Grad-CAM, Layer-wise Relevance Propagation, and occlusion sensitivity analysis have been applied to reveal which microstructural regions contribute most significantly to model predictions (Bazi et al., 2021). Parallel efforts aim to embed physical constraints directly into model architectures or loss functions. Physics-guided neural networks (PGNNs) and theory-informed loss formulations ensure that predictions are not only accurate but also consistent with known physical laws and microstructural mechanics. These approaches improve trustworthiness and facilitate integration with existing computational materials models (Zheng et al., 2022). For instance, incorporating symmetry operations, crystallographic invariants, and defect energetics into the learning pipeline enables the network to learn more meaningful and generalizable representations. Another stream of research involves the fusion of multimodal data—combining image data with scalar features such as composition, processing history, or mechanical properties (Dong H. et al., 2022). By constructing multi-input models or employing attention mechanisms, these frameworks can model complex structure–property–process relationships that govern material behavior. The emphasis on interpretability and physics consistency ensures that deep learning models serve not just as predictive tools but also as instruments for scientific discovery in materials science (Liu and Huang, 2025).

3 Methods

3.1 Overview

In this paper, we investigate the problem of microstructural analysis from a computational perspective, aiming to uncover latent patterns embedded within the complex topology and morphology of material microstructures. This problem is central to a variety of disciplines, including materials science, computational mechanics, and imaging analysis, where fine-grained structural understanding is indispensable for property prediction, synthesis, and optimization. Our method section unfolds as a comprehensive blueprint of the proposed framework, which is grounded in rigorous mathematical modeling, algorithmic innovation, and domain-specific reasoning. In the following sections, we articulate the methodology across three complementary components, each of which addresses a critical stage in the analytical process. The overall structure is designed to support a seamless transition from theoretical abstraction to practical implementation, thereby enhancing both interpretability and extensibility of the proposed pipeline. Section 3.2 lays the groundwork by introducing the essential formalism required for modeling microstructural data, and we establish the notation and mathematical foundations required to express microstructure fields, characterize their variability, and formulate the analytical objectives. These preliminaries include the symbolic encoding of spatial domains, the statistical representation of morphological features, and the formal expression of symmetry and invariance conditions. We also outline the high-level problem setting, emphasizing the role of probabilistic descriptors, topological constraints, and the challenges associated with high-dimensional microstructural manifolds. Section 3.3 introduces our core contribution—a novel generative mechanism tailored for microstructural representation learning, which we refer to as MorphoTensor. Unlike existing approaches that treat microstructure either as deterministic fields or fixed-resolution images, MorphoTensor incorporates hierarchical tensorial embeddings to preserve directional, scale-sensitive, and spatially localized information. This representation enables fine control over the expressivity and regularity of the model and accommodates domain priors such as anisotropy and periodicity. We also integrate latent Gaussian processes into the architecture to capture the uncertainty and multi-modality, ensuring robustness under incomplete or noisy observations. In Section 3.4, we introduce a complementary strategy we term Topology-Aware Augmented Encoding, which governs how microstructures are processed, interpreted, and regularized during learning. This strategy goes beyond conventional supervision or autoencoding schemes by embedding topological invariants—such as Betti numbers and persistence diagrams—into the optimization loop via differentiable approximations. This coupling between topological reasoning and geometric encoding forms a feedback system wherein local morphological consistency and global topological stability co-evolve during training. We explore a data augmentation and sampling regime inspired by persistent homology, which aids in generating diverse yet structurally coherent microstructures for both training and downstream applications.

Each of the three aforementioned sections is designed to build upon the previous one, progressively refining the microstructural analysis from abstract symbolic encoding to structured representations and then to intelligent processing strategies. The integration of these components enables a unified and extensible analysis framework capable of handling a broad spectrum of microstructural modalities—including binary phase fields, grayscale reconstructions, orientation maps, and multiphase composites. Throughout this methodological exposition, we remain anchored to the physical and statistical realities of microstructural data. This includes adherence to periodic boundary conditions, accommodation of multiscale heterogeneities, and respect for the sparsity and redundancy that typify real-world microstructures. Our framework is implemented in a modular fashion, enabling easy extension to supervised learning, inverse design, and uncertainty quantification tasks. Furthermore, the proposed methods are compatible with both synthetic benchmark datasets and empirical datasets derived from electron microscopy, X-ray tomography, and phase-field simulations. The methodology section of this paper lays out a rigorous, principled, and interpretable framework for microstructural analysis, including a formal problem encoding of microstructure variability and spatial characteristics; a generative modeling framework with structural priors and hierarchical embeddings; and a topology-aware processing strategy that couples geometric representation with topological reasoning. These components coalesce to form a holistic analytical toolkit, enabling robust learning and meaningful interpretation of complex material microstructures.

3.2 Preliminaries

Let $Ω \subset R^{d}$ represent a bounded domain in physical space that defines the spatial extent of a microstructure, where $d = 2$ corresponds to a planar setting and $d = 3$ to a volumetric one. A microstructure is modeled as a measurable function $u : Ω \to S$ , where $S$ denotes the material state space. Depending on the context, $S$ may be a discrete label set, a real-valued interval, or a manifold.

We define the space of admissible microstructures Equation 1 as

U = \{u \in L^{2} (Ω, S) : C (u) = 0\}, (1)

where $C : U \to R^{k}$ encodes a set of constraint functionals such as volume fraction, symmetry, or topology preservation.

To characterize microstructure variability, we consider a probability space $(Θ, F, P)$ , where each $θ \in Θ$ corresponds to a latent descriptor and induces a realization $u_{θ} \in U$ . This gives rise to a random field $θ \mapsto u_{θ}$ .

Let $Φ : U \to R^{m}$ denote a feature extractor mapping a microstructure to a finite-dimensional descriptor space, such as statistical moments, correlation functions, or topological invariants.

We define a metric $d_{U}$ measuring the dissimilarity between microstructures Equation 2.

d_{U} (u, v) = {(\int_{Ω} ‖ u (x) - v (x) ‖^{2} d x)}^{1 / 2} . (2)

Given a dataset $D = {u_{i}}_{i = 1}^{N} \subset U$ , we aim to learn a compact representation or generative process for the underlying distribution $P_{U}$ . We express this as finding a mapping Equation 3.

G : Z \to U, z \mapsto G (z), (3)

where $z \in Z \subset R^{k}$ is drawn from a known prior distribution $p_{Z} (z)$ , yielding the generative formulation $u \sim G (z)$ .

Structural constraints are enforced through functionals ${T_{j}}_{j = 1}^{q}$ such that Equation 4

T_{j} (u) = τ_{j}, \forall j \in \{1, \dots, q\}, (4)

including invariants like volume fraction or periodicity.

Table 1 provides an explicit mapping between the abstract constraint functionals $T_{j}$ introduced in the preliminaries and the concrete physical or geometric properties enforced during model training. This mapping helps bridge the mathematical formalism with domain-relevant material descriptors commonly used in microstructural analysis.

Table 1

Table 1. Mapping of constraint functionals $T_{j}$ to material descriptors used in experiments.

3.3 MorphoTensor

To effectively model microstructural variability under geometric, physical, and topological constraints, we propose a novel generative model termed MorphoTensor. This model integrates hierarchical tensor representations with stochastic latent encoding, enabling expressivity over multiscale spatial patterns while respecting the underlying microstructural physics (As shown in Figure 1).

Figure 1

Diagram illustrating two processes: Hierarchical Tensor Convolutions and Latent Spatial Warping with Spectral Interpolation and Residual Paths. The top section shows tensors undergoing convolution and spatial warping. The bottom section depicts tensor convolution leading to spectral interpolation and combination with residual paths.

Figure 1. Schematic diagram of the MorphoTensor architecture. The framework consists of three modules: (1) a Hierarchical Tensor-Based Generator that synthesizes physically plausible microstructure images using multiscale tensorial convolutions; (2) a Latent Spatial Warping Mechanism that introduces deformation fields to capture heterogeneity and anisotropy in microstructures; (3) a Differentiable Regularization module that enforces structural consistency and aligns latent representations with geometric and statistical priors.

3.3.1 Hierarchical Tensor-Based Generator

Let $z \in R^{k}$ be a latent vector sampled from a known prior distribution $p_{Z} (z)$ , typically $z \sim N (0, I_{k})$ . The generative model $G_{θ} : R^{k} \to U$ , parameterized by $θ$ , maps the latent vector to a structured microstructure function $u_{θ} = G_{θ} (z)$ defined over the spatial domain $Ω \subset R^{d}$ (as shown in Figure 2).

Figure 2

Flowchart illustrating a neural network process. It begins with

Figure 2. Schematic diagram of the Hierarchical Tensor-Based Generator framework. Microstructure inputs are encoded into tensor representations, followed by multiscale convolutions, attention mapping, and latent decoding. Spectral interpolation and residual connections are applied to maintain frequency-domain consistency and enhance feature expressiveness. The diagram highlights the flow of structural variables across different processing stages, supporting fine-grained feature learning and robust microstructure classification.

To capture multiscale and anisotropic textures, the generator is constructed using a hierarchy of tensor-valued convolutional layers, spectral interpolation modules, and directional filter banks.

The architecture is defined recursively as a depth- $L$ composition of learned transformations with residual integration Equation 5.

G_{θ} (z) = σ (\sum_{l = 1}^{L} T_{l} (z)), T_{l} (z) = ϕ_{l} ◦ U_{l} ◦ ϕ_{l - 1} ◦ \dots ◦ ϕ_{1} (z), (5)

where $ϕ_{l}$ is a learnable convolutional operator at scale $l$ , $U_{l}$ is a Fourier-domain interpolation that ensures smooth upsampling, and $σ$ is the final nonlinearity adapted to the output range, such as $sigmoid$ for grayscale porosity maps or $softmax$ for categorical phase fields.

Each convolutional layer $ϕ_{l}$ is defined using structured tensorial filters $K^{(l)} \in R^{C_{l} \times C_{l - 1} \times d \times d}$ , where $C_{l}$ is the number of output channels. These filters are constructed as expansions over directional basis functions Equation 6.

K_{i, j}^{(l)} (x) = \sum_{α = 1}^{r} λ_{i, j, α}^{(l)} ψ_{j, α}^{(l)} (x), with ψ_{j, α}^{(l)} \in H_{2} (Ω), (6)

where $H_{2} (Ω)$ is a Hilbert space of steerable wavelets or spherical harmonics, and $λ_{i, j, α}^{(l)}$ are learnable coefficients controlling the response in each direction. The filter rank $r$ determines the expressiveness of each tensor.

To preserve frequency-domain consistency across scales, we use spectral upsampling $U_{l}$ defined by zero-padding and inverse FFT Equation 7.

U_{l} (f) (x) = F^{- 1} (Pad (F [f])) (x), (7)

where $F$ and $F^{- 1}$ denote the discrete Fourier transform and its inverse, respectively, and $Pad (\cdot)$ is a zero-padding operator that enlarges the spatial resolution while maintaining alignment of dominant frequencies.

To ensure that the convolutional responses reflect localized, structured phenomena such as grains, inclusions, or fibers, we enforce local energy normalization on the output of each filter bank Equation 8.

\sum_{i = 1}^{C_{l}} \int_{Ω} | ϕ_{l}^{(i)} (x) |^{2} d x \leq ϵ_{l}, (8)

where $ϵ_{l}$ is a scale-dependent energy budget that constrains the expressiveness and avoids overfitting to high-frequency noise.

To maintain structural diversity across output samples, we apply instance-wise modulation to the filter responses via learned affine coefficients $(γ, β)$ conditioned on $z$ Equation 9.

ϕ_{l}^{mod} (x; z) = γ_{l} (z) \cdot ϕ_{l} (x) + β_{l} (z), (9)

which allows the generator to adjust local contrast and bias according to latent-conditioned semantics.

3.3.2 Latent Spatial Warping Mechanism

We enhance the generative capacity of the model by introducing a latent-driven spatial warping mechanism, where a coordinate deformation field modulates the geometry of the decoded microstructure. This warping function $ξ_{ϕ} : Ω \to Ω$ is constructed to capture spatial heterogeneity and geometric irregularities observed in real-world materials Equation 10.

ξ_{ϕ} (x) = x + δ \cdot \tanh (W_{ϕ} (x)), (10)

where $x \in Ω$ denotes the original coordinate in the spatial domain, $W_{ϕ} : Ω \to R^{d}$ is a deformation field generated by a shallow convolutional neural network with trainable parameters $ϕ$ , and $δ$ is a scalar hyperparameter controlling the amplitude of deformation. The $\tanh$ nonlinearity ensures bounded and smooth deformation behavior, promoting spatial continuity and regularity.

The warped field is computed through a pullback operation, where the decoded unwarped field ${\hat{u}}_{θ}$ is evaluated at the deformed coordinates Equation 11.

u_{θ} (x) = {\hat{u}}_{θ} (ξ_{ϕ} (x)) . (11)

This operation re-parameterizes the spatial layout of the field and allows the generator to model nonstationary features such as gradients, interfaces, and geometric anisotropy that cannot be captured by stationary convolutions alone.

To ensure invertibility and smoothness of the transformation, the deformation field is regularized using a Jacobian-based penalty. Let $J_{ϕ} (x)$ denote the Jacobian matrix of $ξ_{ϕ}$ at point $x$ Equation 12.

J_{ϕ} (x) = \nabla_{x} ξ_{ϕ} (x) = I + δ \cdot diag (1 - \tanh^{2} (W_{ϕ} (x))) \cdot \nabla_{x} W_{ϕ} (x), (12)

where $I$ is the identity matrix and the term involving the derivative of $\tanh$ ensures smooth gradient propagation.

We penalize extreme local distortion by enforcing a regularization on the Frobenius norm of the Jacobian deviation from identity Equation 13.

L_{warp} = \int_{Ω} {‖J_{ϕ} (x) - I‖}_{F}^{2} d x, (13)

which discourages excessive stretching or folding of the coordinate map and ensures physical plausibility of the warped domain.

To preserve volume and avoid folding, we introduce a determinant-based regularizer that promotes diffeomorphic mappings Equation 14.

L_{det} = \int_{Ω} {(\det J_{ϕ} (x) - 1)}^{2} d x . (14)

The total output field $u_{θ} (x)$ generated via warped coordinates inherits both statistical realism and geometric fidelity. During training, gradients flow through $ξ_{ϕ}$ and $W_{ϕ}$ , enabling the model to learn deformation patterns from data without supervision, while satisfying invertibility and smoothness constraints encoded via $L_{warp}$ and $L_{det}$ .

3.3.3 Differentiable Topological and Manifold Regularization

To ensure topological characteristics of generated microstructures are preserved, we embed a differentiable approximation of topological invariants into the training objective. In particular, we focus on Betti numbers $β_{k} (u)$ , which quantify the number of connected components $(k = 0)$ , loops $(k = 1)$ , and voids $(k = 2)$ in the binarized microstructure $u$ . The empirical mismatch between the generated and target topology is penalized by the following loss Equation 15.

L_{topo} (u) = \sum_{k = 0}^{d - 1} {(β_{k} (u) - {\bar{β}}_{k})}^{2}, (15)

where ${\bar{β}}_{k}$ is the dataset-averaged Betti number for dimension $k$ , and $d$ is the spatial dimension of the microstructure domain $Ω$ .

As direct gradients through topological features are not tractable, we adopt a differentiable proxy using persistent homology. Let ${persistence}_{i}^{(k)}$ denote the birth–death lifetime of the $i$ -th feature in the $k$ -th homology group. The smoothed topological count is then approximated Equation 16.

β_{k}^{ϵ} (u) = \sum_{i = 1}^{N_{k}} ρ_{ϵ} ({persistence}_{i}^{(k)}), (16)

where $ρ_{ϵ} (t) = \frac{1}{1 + \exp (- t / ϵ)}$ is a soft threshold function that emphasizes persistent (i.e., long-lived) topological features while suppressing noise artifacts. The parameter $ϵ > 0$ controls the sharpness of the approximation.

To further regularize the geometry of the latent space, we induce a Riemannian metric $g (z)$ on the latent manifold $Z$ via the Jacobian of the generator $G_{θ}$ Equation 17.

g_{i j} (z) = {⟨\frac{\partial G_{θ} (z)}{\partial z_{i}}, \frac{\partial G_{θ} (z)}{\partial z_{j}}⟩}_{L^{2} (Ω)}, (17)

which captures the sensitivity of the generated microstructure to changes in each latent direction and encodes the intrinsic geometry of the generative map.

To discourage excessive curvature in the latent space, which may lead to poorly generalizable representations, we introduce a curvature regularizer based on the Frobenius norm of the second-order derivatives Equation 18.

R_{curv} (z) = Tr (\nabla^{2} g (z) \cdot \nabla^{2} g {(z)}^{T}) . (18)

To encourage smooth topological variation across samples, we introduce a pairwise consistency loss over mini-batches. Let $u_{a} = G_{θ} (z_{a})$ and $u_{b} = G_{θ} (z_{b})$ be two generated microstructures from nearby latent codes Equation 19.

L_{betti - smooth} = \sum_{k = 0}^{d - 1} {(β_{k}^{ϵ} (u_{a}) - β_{k}^{ϵ} (u_{b}))}^{2}, (19)

which penalizes abrupt topological changes and aligns the learned manifold with the topology-aware data structure.

3.4 Topology-Aware Latent Refinement

To effectively train the proposed MorphoTensor model for microstructural generation, we develop a novel strategy termed Topology-Aware Latent Refinement (TLR). This strategy leverages both data-driven loss formulation and structure-aware regularization to achieve physically consistent, statistically expressive, and topologically faithful microstructure synthesis. The TLR approach integrates multiscale supervision, adaptive perturbation schemes, and homology-aligned optimization (as shown in Figure 3).

Figure 3

Diagram of a machine learning model with image and text encoders. An image of a dog running on grass is processed by the Image Encoder, producing a latent representation. The text

Figure 3. Schematic diagram of the Latent Refinement module. The module integrates three components: (1) Multiscale Variational Learning, which captures both global and local structural patterns; (2) Constraint Projection and Augmentation, which incorporates physical priors and adaptive sampling; (3) Latent Decoding with topological and structural regularization. Together, these components refine latent representations to ensure accurate reconstruction, improved generalization, and physically consistent microstructure synthesis.

3.4.1 Multiscale Variational Learning

Given a dataset of microstructures $D = {u_{i}}_{i = 1}^{N}$ , we introduce a stochastic encoder–decoder architecture to learn latent representations that preserve both global structure and fine-scale variability (as shown in Figure 4).

Figure 4

Diagram illustrating a complex machine learning architecture. It features three main sections: Global Patterns Expert, Multi-Scale Learning, and Multiscale Variational Learning. The Global Patterns Expert includes components like Mamba, Feed Forward, and Encoding Layer, using elements like SSM and Conv. Multi-Scale Learning processes include Long and Short-Range TS, Learning, and Patcher. The Multiscale Variational Learning section includes LWT, Local Window Attention, and Positional Encoding. The flow of data is indicated with arrows, showcasing paths through various stages, with outputs represented as patterns and variations.

Figure 4. Schematic diagram of the Multiscale Variational Learning. The figure contains three interconnected modules including Global Patterns Expert, which leverages Mamba-style stacked linear, convolutional, and state-space modeling (SSM) blocks to capture coarse global trends, followed by a feedforward layer; Long-Short Router with Multi-Scale Patcher, where long- and short-range time series (TS) are routed to low- and high-resolution learning paths with a probability split of $P_{L} = 0.63$ and $P_{S} = 0.37$ , respectively, before being fed into dedicated patterns and variation branches; and Multiscale Variational Learning, combining local window attention, positional encoding, and feedforward networks to encode high-frequency dynamics within hierarchical temporal representations.

Each input $u$ is encoded into a latent variable $z \in R^{k}$ through a variational posterior $q_{ϕ} (z | u)$ modeled as a multivariate Gaussian Equation 20.

q_{ϕ} (z | u) = N (μ_{ϕ} (u), diag (σ_{ϕ} {(u)}^{2})), (20)

where $μ_{ϕ}$ and $σ_{ϕ}$ are learned mappings implemented via convolutional neural networks with shared encoder weights $ϕ$ .

The overall training objective follows the variational autoencoder (VAE) framework and is designed to balance accurate reconstruction with latent regularity. The total VAE loss is written Equation 21.

L_{VAE} = E_{u \sim D} [E_{z \sim q_{ϕ} (z | u)} [L_{rec} (u, G_{θ} (z))] + D_{KL} (q_{ϕ} (z | u) ‖ p_{Z} (z))], (21)

where $p_{Z} (z)$ is a standard normal prior and $D_{KL}$ is the Kullback–Leibler divergence that regularizes the posterior.

To measure reconstruction quality, we adopt a scale-adaptive loss defined over a multiresolution decomposition of the microstructure. Let $S_{ℓ}$ denote a Gaussian pyramid downsampling operator at the scale level $ℓ$ and define the hierarchical loss Equation 22.

L_{rec} (u, \hat{u}) = \sum_{ℓ = 0}^{L} ω_{ℓ} \cdot ‖ S_{ℓ} u - S_{ℓ} \hat{u} ‖_{2}^{2}, (22)

where $\hat{u} = G_{θ} (z)$ is the output of the generator, and $ω_{ℓ}$ are scale-specific weights that emphasize higher-resolution errors more strongly, typically chosen as $ω_{ℓ} \propto 2^{ℓ}$ .

To further improve the expressiveness of the latent space, we inject structured noise into $z$ through a reparameterization trick Equation 23.

z = μ_{ϕ} (u) + σ_{ϕ} (u) ⊙ ϵ, ϵ \sim N (0, I_{k}), (23)

which allows gradients to propagate through the stochastic sampling process during training.

Moreover, to prevent posterior collapse and enhance diversity, we regularize the mutual information between $u$ and $z$ using a contrastive lower bound Equation 24.

L_{MI} = - \log \frac{\exp (s (u, z))}{\sum_{j = 1}^{B} \exp (s (u, z_{j}))}, (24)

where $s (u, z) = ⟨ f (u), g (z) ⟩$ is a similarity function computed over learned projections, and $B$ is the mini-batch size.

The total loss incorporates all terms with tunable weights $λ_{rec}, λ_{KL}, λ_{MI}$ Equation 25.

L_{total} = λ_{rec} L_{rec} + λ_{KL} D_{KL} + λ_{MI} L_{MI} . (25)

This objective enables learning of compact, expressive, and scale-aware latent representations tailored for microstructural variability.

4 Experimental setup

4.1 Dataset

For completeness, we also consider results on generic large-scale image datasets (HEDM Muralikrishnan et al., 2023) and (HTEM Steingrimsson et al., 2023) as pretraining/transfer baselines; these details are provided in the Appendix. WELD SEAM Dataset (Zhao et al., 2024) is a fine-grained image classification dataset comprising 8,189 images of weld seams collected from various manufacturing environments. It spans 102 categories, each representing different types or conditions of weld seams. Each category contains between 40 and 258 images characterized by large variations in scale, pose, illumination, and surface finish. The dataset focuses on challenging intra-class similarity and inter-class variation as many weld seams share visual features. The images were obtained through industrial inspection systems and labeled using a combination of automated tools and expert manual verification. It has been widely used for evaluating fine-grained classification algorithms and defect detection models. The high-resolution imagery supports detailed texture and surface feature extraction, crucial for distinguishing subtle differences in weld quality. The MID Dataset (Jackson et al., 2022) contains 5,640 texture images organized into 47 categories based on human-centric attributes such as striped, dotted, fibrous, and bumpy. It emphasizes perceptual texture properties rather than object identities. Each category includes 120 images collected from diverse natural and artificial sources. The dataset supports research in texture recognition, segmentation, and attribute-based representation learning. All images are annotated according to describable attributes defined by human perception rather than material composition or object context. This makes MID suitable for studying mid-level visual attributes and for training models that interpret abstract semantic properties. The dataset challenges models to generalize texture recognition across variations in scale, illumination, and viewpoint.

In this work, EBSD orientation maps and synthetic phase-field simulations are used as the primary datasets for evaluating classification performance and topological consistency.

4.2 Experimental details

All experiments were conducted using the PyTorch framework on a server equipped with NVIDIA A100 GPUs (80 GB memory, CUDA 12.1). Mixed-precision training was adopted to accelerate convergence and reduce memory usage. For the HEDM and HTEM datasets, all images were resized to $224 \times 224$ with central cropping for validation; for EBSD and phase-field datasets, we used $256 \times 256$ inputs with random rotation, flipping, and grayscale jittering to preserve structural variability. The backbone was initialized from ResNet-50 pretrained on HEDM, with dataset-specific classifier heads added. Training was performed with the AdamW optimizer (weight decay = $1 \times 1 0^{- 4}$ , initial learning rate = $3 \times 1 0^{- 4}$ , cosine annealing schedule, 10-epoch warm-up). Batch sizes were 256 for HEDM and 64 for the other datasets. Regularization included label smoothing $(ϵ = 0.1)$ , dropout (0.5), random erasing, and RandAugment. Evaluation metrics included top-1/top-5 accuracy, mean per-class accuracy, F1 score, and multi-class AUC. For multi-class settings, AUC was computed using a macro-averaged one-vs-rest (OvR) scheme, ensuring balanced treatment of all classes. All reported numbers include 95% confidence intervals (CIs), calculated over three independent runs using the Student t-distribution. Statistical significance annotations are standardized across all tables, where $p < 0.05$ , $p < 0.01$ , and $p < 0.001$ are indicated by *, **, and ***, respectively. This ensures consistent reporting and reliable comparison across experiments.

This operator ensures that the generated microstructure maintains a target volume fraction by re-normalizing pixel intensities. The process is differentiable and integrated into the training loop, allowing physics-aware backpropagation without introducing hard constraints in Algorithm 1.

Algorithm 1

Algorithm 1. Constraint Projection Operator Πfor Volume Fraction.

All reported results are averaged over three independent runs. 95% confidence intervals are computed using the Student’s t-distribution with 2 degrees of freedom. Statistical significance tests (two-tailed t-tests) are performed where appropriate.

For multi-class AUC computation, we adopt a macro-averaging strategy using the one-vs-rest approach, which calculates the AUC independently for each class and then takes the unweighted mean. This method is particularly suitable for imbalanced class distributions, such as those in the EBSD and phase-field datasets.

The following training configurations apply specifically to microstructural datasets (EBSD and phase-field). For these datasets, input images are resized to 256 $\times$ 256, and the model is trained from scratch using domain-specific augmentations including rotation, flipping, and grayscale jittering. Evaluation is based on mean per-class accuracy, F1 score, and structural metrics. For completeness, results on generic image classification datasets (HEDM and HTEM) are reported in Appendix, and their corresponding training details are described therein.

All structural metrics are computed on normalized microstructure fields: 2-point correlation values are measured on images scaled to the range [0,1], while length-based descriptors (e.g., chord length and phase size) are expressed as fractions of the total image width.

4.3 Benchmarking against leading methods

Results on general-purpose datasets (HEDM and HTEM) are used only as generic pretraining and transfer baselines. Our primary evaluations focus on EBSD and phase-field datasets, which directly reflect the microstructural nature of the problem.

To address domain-specific validation, we incorporated two microstructure-centered datasets: (i) EBSD orientation maps from Ti-Al alloy specimens in Table 2 and (ii) synthetic phase-field simulations using the Cahn-Hilliard equation in Table 3. The following tables report classification metrics and topology-aware evaluation on these datasets.

Table 2

Table 2. Classification performance on EBSD and Phase-field datasets with 95% confidence intervals. Values are reported as mean $\pm$ 95% CI over three runs. Significance is denoted as * $(p < 0.05)$ , ** $(p < 0.01)$ , and *** $(p < 0.001)$ .

Table 3

Table 3. Microstructure-aware metric comparison on generated vs. real samples (95% confidence intervals). Values are reported as mean $\pm$ 95% CI over three runs. Significance is denoted as * $(p < 0.05)$ , ** $(p < 0.01)$ , and *** $(p < 0.001)$ . All reported structural metrics are based on normalized images: 2-point correlation is computed on [0,1] scaled fields, and length features are expressed as fractions of image width.

Table 4 presents the results of our sensitivity analysis. The performance peak occurs at $ϵ = 0.10$ and $λ_{topo} = 0.5$ , suggesting a balanced trade-off between smooth differentiability and structural fidelity. Larger $ϵ$ values reduce the sensitivity to topological nuances, while overly aggressive topology weights (e.g., $λ = 2.0$ ) distort the geometric manifold, hurting classification accuracy. These findings guide the robust tuning of topological constraints in generative training for microstructure modeling.

Table 4

Table 4. Sensitivity analysis of $ϵ$ and $λ_{topo}$ with 95% confidence intervals. Values are reported as mean $\pm$ 95% CI over three runs. Significance is denoted as * $(p < 0.05)$ , ** $(p < 0.01)$ , and *** $(p < 0.001)$ . All reported structural metrics are based on normalized images: 2-point correlation is computed on [0,1] scaled fields, and length features are expressed as fractions of image width.

Table 5 shows that our model preserves both semantic accuracy and topological structure under significant shifts in porosity and grain morphology. Unlike CNNs or VAEs, MorphoTensor maintains low persistence diagram distance and Euler number error, confirming its robustness in generalizing to unseen structural regimes.

Table 5

Table 5. Robustness evaluation on out-of-distribution (OOD) morphologies with 95% confidence intervals. Values are reported as mean $\pm$ 95% CI over three runs. Significance is denoted as * $(p < 0.05)$ , ** $(p < 0.01)$ , and *** $(p < 0.001)$ . All reported structural metrics are based on normalized images: 2-point correlation is computed on [0,1] scaled fields, and length features are expressed as fractions of image width.

All structural metrics are computed on normalized microstructure fields. 2-point correlation functions are evaluated over images scaled to [0, 1], and length-based descriptors (e.g., chord length and phase size) are expressed as fractions of the total image width. These conventions ensure comparability across datasets of different spatial resolutions.

4.4 Feature removal study

Results on general datasets (HEDM and HTEM) provide a better understanding of the model's baseline performance in large-scale environments.

Table 6 demonstrates the effects of different components of the MorphoTensor generative pipeline on the classification performance. The inclusion of synthetic microstructures generated under topological and physical constraints consistently improves the accuracy, recall, and AUC. Ablating topology loss or latent warping results in a degradation of performance, indicating that these modules are critical for preserving structural consistency in generated samples and enhancing classifier robustness.

Table 6

Table 6. Ablation study on the impact of MorphoTensor pipeline components in classification tasks (EBSD dataset).

Table 7 presents the computational overhead introduced by the persistent homology (PH) loss during training on the EBSD dataset. Compared to the baseline configuration without PH losses, the inclusion of topology-aware regularization increases the per-epoch training time by approximately 28% and incurs a modest memory overhead of 319 MB. On a per-batch basis, the training time increases from 98.3 m to 126.0 m. Despite this increase in cost, the performance gains in topological fidelity and microstructure realism (as discussed below) justify the additional computation, especially in high-stakes applications involving materials informatics or microstructure-sensitive design tasks.

Table 7

Table 7. Runtime and memory overhead from persistent homology losses (EBSD dataset).

Table 8 compares the performance of three configurations on a publicly available EBSD dataset using both classification metrics and microstructure-aware topology metrics. While the CNN baseline achieves acceptable accuracy, it performs poorly on structural metrics such as two-point correlation error, chord length KL divergence (measured as a fraction of image width), and persistence diagram distance. Incorporating the MorphoTensor model without topology loss improves both accuracy and structure preservation. However, the full model with persistent homology loss achieves the best results across all metrics, indicating that topological regularization significantly enhances the geometric and physical plausibility of the generated or processed microstructures. This validates the core hypothesis of our work—that topology-aware generative modeling leads to structurally faithful representations in computational materials science.

Table 8

Table 8. Evaluation of MorphoTensor on EBSD dataset (microstructure-aware metrics).

For full reproducibility, we provide an anonymous code repository that includes training scripts, model configurations, and instructions for dataset preparation: https://snippets.cacher.io/snippet/0a9c95f0fa961047e7cd. The repository covers dataset-specific preprocessing (e.g., EBSD map cleaning and simulation of phase-field morphologies), hardware-agnostic training pipelines, and logging templates. This enables independent validation of all reported results. Upon final publication, this repository will be made publicly available under an open-source license.

5 Conclusions and future work

This work aims to overcome key shortcomings of conventional deep learning methods when applied to microstructural image analysis in the field of computational materials science. Existing methods often struggle with preserving the complex topology and spatial heterogeneity that characterize real-world material microstructures. To overcome this, we developed a novel deep learning framework built around the structured generative model MorphoTensor, which integrates physical, geometric, and topological priors into the learning process. This model introduces hierarchical tensorial embeddings that capture crucial characteristics such as directionality, anisotropy, and spatial locality. We proposed a Topology-Aware Latent Refinement (TALR) strategy, which leverages persistent homology and differentiable Betti numbers to ensure topological fidelity. This comprehensive design allows the model to unify statistical encoding, topological analysis, and latent space alignment. Our experiments, conducted across both synthetic and real microscopy datasets, confirm substantial improvements in classification accuracy, robustness, and generalization compared to conventional convolutional networks and autoencoders.

While the results are promising, there remain some limitations. First, the model’s reliance on complex topological tools such as persistent homology introduces computational overhead, which could hinder scalability in real-time or high-throughput applications. Second, although our model generalizes well across datasets, its performance on entirely unseen microstructural morphologies especially those outside the training distribution—still warrants further investigation. Looking ahead, future research should aim to enhance the computational efficiency of the TALR module and expand the framework to accommodate 3D volumetric datasets. Integrating active learning and physics-based simulation feedback loops also presents a compelling direction for enhancing the adaptability and physical validity of learned representations.

While our current framework operates on 2D microstructural images, the core components—tensorized encoding, spatial warping, topological losses, and physical constraints—generalize naturally to 3D volumetric data. Extending MorphoTensor to 3D would involve using 3D convolutional operators in the encoder and decoder, volumetric warping fields regularized via 3D Jacobian determinants, and persistent homology computed over voxelized inputs. Notably, efficient algorithms for computing persistent diagrams in 3D exist and can be integrated into the current training pipeline. This direction is particularly promising for applications involving X-ray tomography, 3D EBSD, or synthetic phase-field volumes and represents a key avenue for future work.

Beyond the architectural and learning-based contributions, our framework offers tangible benefits for downstream materials science workflows. For instance, the accurate preservation of chord-length distributions directly supports permeability estimation in porous materials and fatigue life modeling in polycrystalline alloys, where feature spacing and persistence affect crack initiation. Similarly, the ability to reproduce orientation distributions from EBSD-like inputs enhances the predictive modeling of anisotropic mechanical properties, such as elastic modulus and thermal expansion. By maintaining topological consistency and structural diversity, our method strengthens microstructure–property linkages critical to alloy design loops, defect screening, and performance certification in computational materials pipelines. These connections highlight the broader utility of topology-aware generative modeling in practical materials informatics tasks.

While additional experiments on HEDM and HTEM datasets are included in the Appendix as transfer baselines, the main evidence of our method is provided by EBSD and phase-field datasets, which are most representative of microstructural analysis.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. The code supporting the findings of this study is available via Cacher: https://snippets.cacher.io/snippet/0a9c95f0fa961047e7cd. Further inquiries can be directed to the corresponding author.

Author contributions

HL: Conceptualization, Methodology, Software, Validation, Writing – original draft. PZ: Formal analysis, Investigation, Data curation, Writing – original draft. CT: Writing – original draft, writing – review and editing, Visualization, Supervision, Funding acquisition.

Funding

The authors declare that financial support was received for the research and/or publication of this article. Research Projects of the 14th Five-Year Plan for Educational Science under the Hebei Provincial Department of Education: Research on the Construction of Industry-Education Integration Community in the Field of Aerospace Cybersecurity Testing (No. 2503105).

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Alotaibi, J. G., Eid Alajmi, A., Kadirgama, K., Samylingam, L., Aslfattahi, N., Kok, C. K., et al. (2025). Enhancing engine oil performance with graphene-cellulose nanoparticles: insights into thermophysical properties and tribological behavior. Front. Mater. 12, 1549117. doi:10.3389/fmats.2025.1549117

CrossRef Full Text | Google Scholar

Ashtiani, F., Geers, A. J., and Aflatouni, F. (2021). An on-chip photonic deep neural network for image classification. Nature 606, 501–506. doi:10.1038/s41586-022-04714-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., et al. (2021). “Big self-supervised models advance medical image classification,” in IEEE international conference on computer vision.

Google Scholar

Bazi, Y., Bashmal, L., Rahhal, M. M. A., Dayil, R. A., and Ajlan, N. A. (2021). Vision transformers for remote sensing image classification. Remote Sens. 13, 516. doi:10.3390/rs13030516

CrossRef Full Text | Google Scholar

Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., and Veit, A. (2021). “Understanding robustness of transformers for image classification,” in IEEE international conference on computer vision.

Google Scholar

Bostanabad, R., Zhang, Y., Li, X., Kearney, T., Brinson, L. C., Apley, D. W., et al. (2018). Computational microstructure characterization and reconstruction: review of the state-of-the-art techniques. Prog. Mater. Sci. 95, 1–41. doi:10.1016/j.pmatsci.2018.01.005

CrossRef Full Text | Google Scholar

Chen, C.-F., Fan, Q., and Panda, R. (2021a). “Crossvit: Cross-attention multi-scale vision transformer for image classification,” in IEEE international conference on computer vision.

Google Scholar

Chen, L., Li, S., Bai, Q., Yang, J., Jiang, S., and Miao, Y. (2021b). Review of image classification algorithms based on convolutional neural networks. Remote Sens. 13, 4712. doi:10.3390/rs13224712

CrossRef Full Text | Google Scholar

Dai, Y., Gao, Y., and Liu, F. (2021). Transmed: transformers advance multi-modal medical image classification. Diagnostics 11, 1384. doi:10.3390/diagnostics11081384

PubMed Abstract | CrossRef Full Text | Google Scholar

DeCost, B. L., and Holm, E. A. (2015). A computer vision approach for automated analysis and classification of microstructural image data. Comput. Mater. Sci. 110, 126–133. doi:10.1016/j.commatsci.2015.08.011

CrossRef Full Text | Google Scholar

Dong, H., Zhang, L., and Zou, B. (2022a). Exploring vision transformers for polarimetric sar image classification. IEEE Trans. Geoscience Remote Sens. 60, 1–15. doi:10.1109/tgrs.2021.3137383

CrossRef Full Text | Google Scholar

Dong, X., Bao, J., Zhang, T., Chen, D., Gu, S., Zhang, W., et al. (2022b). Clip itself is a strong fine-tuner: achieving 85.7% and 88.0% top-1 accuracy with vit-b and vit-l on imagenet. arXiv Prepr. arXiv:2212.06138. Available online at: https://arxiv.org/abs/2212.06138.

Google Scholar

Elpeltagy, M., and Sallam, H. (2021). Automatic prediction of covid- 19 from chest images using modified resnet50. Multimedia tools Appl. 80, 26451–26463. doi:10.1007/s11042-021-10783-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Feng, J., Tan, H., Li, W., and Xie, M. (2022). “Conv2next: reconsidering conv next network design for image recognition,” in 2022 international conference on computers and artificial intelligence technologies (CAIT) (IEEE), 53–60. Available online at: https://ieeexplore.ieee.org/abstract/document/10072172.

CrossRef Full Text | Google Scholar

Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., and Chanussot, J. (2020). Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geoscience Remote Sens. 59, 5966–5978. doi:10.1109/tgrs.2020.3015157

CrossRef Full Text | Google Scholar

Hong, D., Han, Z., Yao, J., Gao, L., Zhang, B., Plaza, A., et al. (2021). Spectralformer: rethinking hyperspectral image classification with transformers. IEEE Trans. Geoscience Remote Sens. 60, 1–15. doi:10.1109/tgrs.2021.3130716

CrossRef Full Text | Google Scholar

Jackson, J., Owsiak, A. P., Goertz, G., and Diehl, P. F. (2022). Getting to the root of the issue (s): expanding the study of issues in mids (the mid-issue dataset, version 1.0). J. Confl. Resolut. 66, 1514–1542. doi:10.1177/00220027221080967

CrossRef Full Text | Google Scholar

Kalidindi, S. R., and De Graef, M. (2015). Materials data science: current status and future outlook. Annu. Rev. Mater. Res. 45, 171–193. doi:10.1146/annurev-matsci-070214-020844

CrossRef Full Text | Google Scholar

Kim, H. E., Cosa-Linan, A., Santhanam, N., Jannesari, M., Maros, M., and Ganslandt, T. (2022). Transfer learning for medical image classification: a literature review. BMC Medical Imaging.Available online at: https://link.springer.com/article/10.1186/s12880-022-00793-7.

Google Scholar

Koonce, B. (2021a). “Efficientnet,” in Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization (Springer), 109–123. Available online at: https://link.springer.com/book/10.1007/978-1-4842-6168-2.

Google Scholar

Koonce, B. (2021b). “Mobilenetv3,” in In Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization. Springer, 125–144. Available online at: https://link.springer.com/book/10.1007/978-1-4842-6168-2.

Google Scholar

Li, B., Li, Y., and Eliceiri, K. (2020). “Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning,” in Computer vision and pattern recognition.

Google Scholar

Liu, G., and Huang, H. (2025). Research progress of amorphous micro-nano structured materials. Front. Mater. 12, 1589830. doi:10.3389/fmats.2025.1589830

CrossRef Full Text | Google Scholar

Mai, Z., Li, R., Jeong, J., Quispe, D., Kim, H. J., and Sanner, S. (2021). Online continual learning in image classification: an empirical survey. Neurocomputing 469, 28–51. doi:10.1016/j.neucom.2021.10.021

CrossRef Full Text | Google Scholar

Masana, M., Liu, X., Twardowski, B., Menta, M., Bagdanov, A. D., and van de Weijer, J. (2020). Class-incremental learning: survey and performance evaluation on image classification. IEEE Trans. Pattern Analysis Mach. Intell. 45, 5513–5533. doi:10.1109/tpami.2022.3213473

PubMed Abstract | CrossRef Full Text | Google Scholar

Mascarenhas, S., and Agarwal, M. (2021). “A comparison between vgg16, vgg19 and resnet50 architecture frameworks for image classification,” in 2021 international conference on disruptive technologies for multi-disciplinary research and applications (CENTCON).

Google Scholar

Maurício, J., Domingues, I., and Bernardino, J. (2023). Comparing vision transformers and convolutional neural networks for image classification: a literature review. Appl. Sci. Available online at: https://www.mdpi.com/2076-3417/13/9/5521.

Google Scholar

Muralikrishnan, V., Liu, H., Yang, L., Conry, B., Marvel, C. J., Harmer, M. P., et al. (2023). Observations of unexpected grain boundary migration in srtio3. Scr. Mater. 222, 115055. doi:10.1016/j.scriptamat.2022.115055

CrossRef Full Text | Google Scholar

Peng, J., Huang, Y., Sun, W., Chen, N., Ning, Y., and Du, Q. (2022). Domain adaptation in remote sensing image classification: a survey. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 15, 9842–9859. doi:10.1109/jstars.2022.3220875

CrossRef Full Text | Google Scholar

Rao, Y., Zhao, W., Zhu, Z., Lu, J., and Zhou, J. (2021). Global filter networks for image classification. Neural Inf. Process. Syst. Available online at: https://proceedings.neurips.cc/paper/2021/hash/07e87c2f4fc7f7c96116d8e2a92790f5-Abstract.html.

Google Scholar

Rezaei, S., Asl, R. N., Taghikhani, K., Moeineddin, A., Kaliske, M., and Apel, M. (2024a). Finite operator learning: bridging neural operators and numerical methods for efficient parametric solution and optimization of pdes. arXiv Prepr. arXiv:2407.04157. Available online at: https://arxiv.org/abs/2407.04157.

Google Scholar

Rezaei, S., Moeineddin, A., and Harandi, A. (2024b). Learning solutions of thermodynamics-based nonlinear constitutive material models using physics-informed neural networks. Comput. Mech. 74, 333–366. doi:10.1007/s00466-023-02435-3

CrossRef Full Text | Google Scholar

Rezaei, S., Asl, R. N., Faroughi, S., Asgharzadeh, M., Harandi, A., Koopas, R. N., et al. (2025). A finite operator learning technique for mapping the elastic properties of microstructures to their mechanical deformations. Int. J. Numer. Methods Eng. 126, e7637. doi:10.1002/nme.7637

CrossRef Full Text | Google Scholar

Roy, S. K., Deria, A., Hong, D., Rasti, B., Plaza, A., and Chanussot, J. (2022). Multimodal fusion transformer for remote sensing image classification. IEEE Trans. Geoscience Remote Sens. 61, 1–20. doi:10.1109/tgrs.2023.3286826

CrossRef Full Text | Google Scholar

Ru, J., Fang, Y., Guo, Y., Jiang, L., Liu, H., Li, X., et al. (2025). Rheology, permeability and microstructure of seawater-based slurry for slurry shield tunneling: insights from laboratory tests. Front. Mater. 12, 1592537. doi:10.3389/fmats.2025.1592537

CrossRef Full Text | Google Scholar

Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., and Homayouni, S. (2020). Support vector machine versus random forest for remote sensing image classification: a meta-analysis and systematic review. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 13, 6308–6325. doi:10.1109/jstars.2020.3026724

CrossRef Full Text | Google Scholar

Steingrimsson, B., Agrawal, A., Fan, X., Kulkarni, A., Thoma, D., and Liaw, P. (2023). Construction of multi-dimensional functions for optimization of additive-manufacturing process parameters. arXiv Prepr. arXiv:2311.06398. Available online at: https://arxiv.org/abs/2311.06398

Google Scholar

Sun, L., Zhao, G., Zheng, Y., and Wu, Z. (2022). Spectral–spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans. Geoscience Remote Sens. 60, 1–14. doi:10.1109/tgrs.2022.3144158

CrossRef Full Text | Google Scholar

Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., and Schmidt, L. (2020). Measuring robustness to natural distribution shifts in image classification. Neural Inf. Process. Syst. Available online at: https://proceedings.neurips.cc/paper/2020/hash/d8330f857a17c53d217014ee776bfd50-Abstract.html.

Google Scholar

Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J., and Isola, P. (2020). “Rethinking few-shot image classification: a good embedding is all you need?,” in European conference on computer vision.

Google Scholar

Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., et al. (2021). Resmlp: feedforward networks for image classification with data-efficient training. IEEE Trans. Pattern Analysis Mach. Intell. 45, 5314–5321. doi:10.1109/tpami.2022.3206148

PubMed Abstract | CrossRef Full Text | Google Scholar

Touvron, H., Cord, M., and Jégou, H. (2022). “Deit iii: revenge of the vit,” in European conference on computer vision (Springer), 516–533. doi:10.1007/978-3-031-20053-3_30

CrossRef Full Text | Google Scholar

Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., et al. (2022). Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559. doi:10.1016/j.media.2022.102559

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., et al. (2021). Medmnist v2 - a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Sci. Data 10, 41. doi:10.1038/s41597-022-01721-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, C., Cai, Y., Lin, G., and Shen, C. (2020). Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. Comput. Vis. Pattern Recognit., 12200–12210. doi:10.1109/cvpr42600.2020.01222

CrossRef Full Text | Google Scholar

Zhang, Y., Li, W., Sun, W., Tao, R., and Du, Q. (2022). Single-source domain expansion network for cross-scene hyperspectral image classification. IEEE Trans. Image Process. 32, 1498–1512. doi:10.1109/tip.2023.3243853

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao, Y., Li, Z., Wang, Z., and Chen, Y. (2024). Enhancing weld seam recognition in industrial robotics through advanced deep learning techniques. 17th Int. Sci. Pract. Conf. “The latest Technol. Dev. Sci. Bus. Education”(April 30–May 03, 2024) Lond. Great Britain. Int. Sci. Group 2024, 446. Available online at: https://books.google.com.hk/books?hl=zh-CN&lr=&id=OTEZEQAAQBAJ&oi=fnd&pg=PA390&dq=Enhancing+weld+seam+recognition++in&#+;industrial+robotics+through+advanced+deep+learning+techniques&ots=Szgf9H1JJW&sig=DrnIpUqaVZeYFotbqutg9qCHPQI&redir_esc=y#v=onepage&q=Enhancing%20weld%20seam%20recognition%20%20in%20industrial%20robotics%20through%20advanced%20deep%20learning%20techniques&f=false.

PubMed Abstract | Google Scholar

Zheng, X., Sun, H., Lu, X., and Xie, W. (2022). Rotation-invariant attention network for hyperspectral image classification. IEEE Trans. Image Process. 31, 4251–4265. doi:10.1109/tip.2022.3177322

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhu, Y., Zhuang, F., Wang, J., Ke, G., Chen, J., Bian, J., et al. (2020). Deep subdomain adaptation network for image classification. IEEE Trans. Neural Netw. Learn. Syst. 32, 1713–1722. doi:10.1109/tnnls.2020.2988928

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: microstructural analysis, deep generative models, topological learning, computational materials science, MorphoTensor

Citation: Liu H, Zhu P and Tan C (2026) Deep learning-based image classification for microstructural analysis in computational materials science. Front. Mater. 12:1648653. doi: 10.3389/fmats.2025.1648653

Received: 17 June 2025; Accepted: 27 October 2025;
Published: 20 January 2026.

Edited by:

Shahed Rezaei, Access e.V., Germany

Reviewed by:

Alexandre Viardin, Access e.V., Germany
Aleena Baby, Access e.V., Germany
Nayan Kumar Sarkar, IILM University, India

Copyright © 2026 Liu, Zhu and Tan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Penghua Zhu, aGF6ZXJhcm9naTBAaG90bWFpbC5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.