- 1North China Institute of Aerospace Engineering, Langfang, Hebei, China
- 2Cangzhou Normal University, Cangzhou, Hebei, China
- 3Changchun University of Technology, Changchun, Jilin, China
Introduction: Recently, the integration of deep learning techniques and computational materials science has catalyzed significant advances in the microstructural analysis of materials, particularly through the lens of multiscale, high-dimensional imaging data. However, conventional models often fall short in capturing the intricate topology and spatial variability that define realistic microstructural patterns, limiting their ability to inform material property predictions, inverse design, and structural synthesis.
Methods: To overcome these challenges, we introduce an innovative deep learning framework designed for microstructural image classification and representation learning, incorporating physical, geometric, and topological constraints directly into the training process. Our method, centered on the structured generative model MorphoTensor, introduces hierarchical tensorial embeddings that retain directionality, anisotropy, and spatial locality—features crucial for realistic material modeling. We further incorporate a Topology-Aware Latent Refinement strategy, which couples persistent homology with differentiable approximations of Betti numbers to enforce topological consistency and augment microstructural diversity. Unlike existing data-driven pipelines, our framework seamlessly integrates statistical encoding, topologicalization, and latent manifold alignment within a unified architecture, ensuring robustness across diverse datasets including phase-field simulations and real microscopy data.
Results and Discussion: Empirical evaluations on benchmark and experimental datasets demonstrate that our method significantly outperforms standard convolutional and autoencoding baselines in accuracy, stability, and generalization. Moreover, our approach aligns closely with the ongoing efforts in the broader computational materials and mechanics communities to build interpretable, physically informed, and adaptable deep learning systems. These contributions illustrate the potential of structured deep generative modeling as a foundational tool for advancing intelligent microstructure analysis and design in materials informatics.
1 Introduction
The rapid advancement of computational materials science has made it possible to simulate and analyze with unprecedented accuracy the microstructural features of materials (Chen C.-F. et al., 2021). However, accurately classifying these microstructures remains a challenging task due to the intricate patterns, varying scales, and diverse morphologies present in materials data (Hong et al., 2021). Traditional image analysis techniques struggle to generalize across different material systems, leading to inconsistent performance (Maurício et al., 2023). Therefore, there is a growing necessity for more robust and adaptive methods to interpret microstructural images. Deep learning, especially convolutional neural networks (CNNs), has emerged as a powerful solution for such tasks, not only enhancing classification accuracy but also enabling the discovery of subtle structural patterns that are difficult to identify through manual or conventional computational methods (Touvron et al., 2021). The integration of deep learning into microstructural analysis holds promise for accelerating materials discovery, optimizing material properties, and improving predictive modeling capabilities (Wang et al., 2022).
Initial studies approached microstructural image interpretation through classical computer vision techniques that emphasized low-level descriptors and algorithmic rules. These methods typically relied on predefined image processing operations such as edge detection, texture analysis, and morphological transformations (Tian et al., 2020). The extracted features were then used to construct visual representations that could be manually classified or interpreted by domain experts. Such techniques offered clear interpretability and were relatively straightforward to implement, making them well-suited for early investigations into structured or periodic microstructures (Yang et al., 2021). However, their effectiveness was largely constrained to idealized or synthetic datasets, where visual patterns exhibited strong regularity and minimal noise. In real-world materials, microstructures often display high variability in scale, orientation, and contrast, compounded by imaging artifacts and inter-sample heterogeneity (Hong et al., 2020). Classical descriptors, being low level and often linear in nature, lacked the expressiveness to model these complexities. Consequently, their generalization ability across different material systems, imaging modalities, or sample preparation methods was limited. In response to these shortcomings, the field began transitioning toward more adaptive and data-responsive frameworks. Researchers introduced semi-automated pipelines that combined classical feature extraction with rule-based decision trees or clustering algorithms, aiming to reduce the burden of manual annotation while improving consistency (Sun et al., 2022). These hybrid methods offered improved flexibility and some resilience to noise and structural diversity, but they still depended heavily on expert knowledge to define relevant features and threshold values. As the demand for scalable and generalizable microstructural analysis grew—particularly in high-throughput materials discovery contexts—it became clear that more robust, data-driven modeling approaches were needed to cope with the growing complexity and volume of materials imaging data (Rao et al., 2021).
Building on this need for adaptability, researchers began integrating statistical modeling and pattern recognition techniques that allowed systems to learn from annotated examples rather than relying solely on fixed rule sets (Kim et al., 2022). This marked a methodological shift toward supervised learning paradigms, where algorithms were trained to associate input features with known output labels based on curated microstructural datasets. Methods like support vector machines (SVM), random forests, k-nearest neighbors (k-NN), and principal component analysis (PCA) have gained broad usage in tasks involving classification, clustering, and dimensionality reduction (Mai et al., 2021). These models were typically coupled with engineered feature extraction pipelines involving texture descriptors, histogram statistics, frequency domain transforms, and geometric quantifiers of microstructural morphology (Bostanabad et al., 2018). The resulting hybrid frameworks improved both prediction accuracy and computational efficiency relative to early rule-based approaches, particularly for moderately sized datasets where manual labeling was feasible. Their utility was demonstrated in tasks such as grain boundary classification, phase segmentation, and defect detection in polycrystalline or composite materials (Azizi et al., 2021). Despite these advances, the performance of such models remained tightly bound to the quality and representativeness of the input features. Because feature design was largely manual and guided by domain heuristics, important structural cues—especially those spanning multiple spatial scales or exhibiting irregular patterns—were often missed or poorly encoded (DeCost and Holm, 2015). Furthermore, these models lacked the hierarchical representation learning capacity necessary to capture complex dependencies and interactions within heterogeneous microstructures (Li et al., 2020). As a result, their generalization capability across diverse material systems, imaging resolutions, or sample preparation techniques was limited (Kalidindi and De Graef, 2015). These shortcomings highlighted the need for more expressive, automated, and data-adaptive frameworks that could learn robust feature representations directly from raw or minimally processed image data (Bhojanapalli et al., 2021).
The evolution toward fully trainable, end-to-end systems marked a significant turning point with the adoption of deep learning models (Zhang et al., 2020). Convolutional neural networks (CNNs) and their extensions enabled direct learning from raw image data, bypassing the need for handcrafted features (Roy et al., 2022). These architectures proved especially powerful in capturing hierarchical spatial patterns and adapting across different imaging settings (Zhu et al., 2020). The availability of larger datasets and computational resources further amplified their impact, allowing for deeper and more expressive models (Chen L. et al., 2021). The application of transfer learning techniques and domain-specific fine-tuning expanded their reach into data-limited areas of materials research. While challenges such as interpretability and generalization across modalities persist, the integration of deep learning with physics-guided modeling and generative frameworks continues to push the boundaries of automated microstructural analysis (Ashtiani et al., 2021).
Based on the limitations of feature dependence in ML and the data hunger and interpretability concerns of deep learning, we propose a hybrid method that leverages domain-specific inductive biases and lightweight attention-enhanced CNNs to balance performance, efficiency, and generalizability. Our approach integrates a shallow attention module that dynamically focuses on microstructural regions of interest, combined with transfer learning from a domain-specific pretraining phase, allowing the model to effectively classify diverse microstructures with fewer labeled examples. By embedding prior knowledge and enhancing feature saliency, our model addresses both the data efficiency and explainability challenges commonly faced in deep learning-based material image analysis. Furthermore, we evaluate the method across multiple datasets covering various material types and imaging resolutions, demonstrating its robustness and practical value for accelerating microstructural classification in computational materials science.
• We introduce a novel attention-augmented CNN architecture tailored for microstructural image classification, enabling dynamic focus on relevant texture features.
• The method exhibits high adaptability across different materials and imaging conditions, ensuring generalizability and efficiency in practical applications.
• Experimental results show that our model achieves superior classification accuracy with reduced training data requirements, outperforming conventional CNN baselines.
2 Related work
2.1 Convolutional networks for microstructures
The application of convolutional neural networks (CNNs) to microstructural image classification has emerged as a central theme in computational materials science. CNNs are well-suited to this domain due to their capability to capture hierarchical spatial features in image data, which is critical when analyzing complex textures and phase distributions inherent in materials microstructures (Masana et al., 2020). Early efforts focused on utilizing standard architectures such as AlexNet and VGGNet to distinguish between different grain morphologies, crystal orientations, and defect types. These models demonstrated strong performance on datasets of synthetic micrographs generated through phase-field simulations or molecular dynamics (Rezaei et al., 2025). Subsequent studies improved upon these methods by incorporating domain-specific augmentations and preprocessing techniques tailored to the nature of materials images. For, contrast normalization, orientation alignment, and noise filtering were often employed to standardize inputs and enhance feature salience (Sheykhmousa et al., 2020). Transfer learning from pretrained networks on natural image datasets such as HEDM has also been shown to significantly boost performance, particularly when labeled microstructural datasets are limited in size. Recent work has moved beyond mere classification to integrate CNNs with unsupervised learning and clustering to uncover latent structural patterns (Mascarenhas and Agarwal, 2021). Hybrid methods that combine CNN-based feature extractors with classical machine learning classifiers have proven effective in improving the generalizability of findings across diverse material systems. Such approaches underscore the adaptability of deep convolutional models in handling the heterogeneity and high dimensionality typical of microstructural data in materials informatics (Rezaei et al., 2024a).
2.2 Data augmentation and synthesis
Data scarcity remains a pressing challenge in the development of robust deep learning models for microstructural classification (Rezaei et al., 2024b). To address this, a variety of data augmentation and synthesis strategies have been employed. Basic augmentation techniques such as rotation, flipping, scaling, and elastic deformation are widely adopted to enhance model generalization and reduce overfitting. These transformations simulate the physical variability present in microstructural samples without altering their intrinsic material characteristics (Zhang et al., 2022). More sophisticated approaches leverage generative adversarial networks (GANs) to create realistic synthetic micrographs. GANs can learn the underlying distribution of microstructural images and generate high-fidelity examples that preserve critical statistical and textural properties. These synthetic datasets not only augment training corpora but also support model benchmarking under controlled conditions (Dai and Gao, 2021). Conditional GANs (cGANs) have further enabled the generation of class-specific microstructures, enhancing the diversity and utility of synthetic samples in supervised learning contexts. Another promising avenue involves the use of physics-informed simulations to generate labeled microstructural data. Phase-field modeling, Monte Carlo methods, and cellular automata simulations are commonly utilized to produce synthetic micrographs with known ground truths (Taori et al., 2020). These simulated datasets serve as a valuable source of training data, particularly for rare or experimentally inaccessible microstructural features. Integrating such data with real experimental micrographs through domain adaptation techniques can bridge the synthetic–real gap and improve model transferability to practical applications (Alotaibi et al., 2025).
2.3 Interpretable and physics-guided models
The integration of interpretability and physical priors into deep learning frameworks represents crucial research (Ru et al., 2025). Traditional CNNs, while powerful, often function as black boxes, providing little insight into the underlying material phenomena driving classification outcomes (Peng et al., 2022). To mitigate this, recent work has explored explainable AI (XAI) techniques to visualize salient features and activation maps. Methods such as Grad-CAM, Layer-wise Relevance Propagation, and occlusion sensitivity analysis have been applied to reveal which microstructural regions contribute most significantly to model predictions (Bazi et al., 2021). Parallel efforts aim to embed physical constraints directly into model architectures or loss functions. Physics-guided neural networks (PGNNs) and theory-informed loss formulations ensure that predictions are not only accurate but also consistent with known physical laws and microstructural mechanics. These approaches improve trustworthiness and facilitate integration with existing computational materials models (Zheng et al., 2022). For instance, incorporating symmetry operations, crystallographic invariants, and defect energetics into the learning pipeline enables the network to learn more meaningful and generalizable representations. Another stream of research involves the fusion of multimodal data—combining image data with scalar features such as composition, processing history, or mechanical properties (Dong H. et al., 2022). By constructing multi-input models or employing attention mechanisms, these frameworks can model complex structure–property–process relationships that govern material behavior. The emphasis on interpretability and physics consistency ensures that deep learning models serve not just as predictive tools but also as instruments for scientific discovery in materials science (Liu and Huang, 2025).
3 Methods
3.1 Overview
In this paper, we investigate the problem of microstructural analysis from a computational perspective, aiming to uncover latent patterns embedded within the complex topology and morphology of material microstructures. This problem is central to a variety of disciplines, including materials science, computational mechanics, and imaging analysis, where fine-grained structural understanding is indispensable for property prediction, synthesis, and optimization. Our method section unfolds as a comprehensive blueprint of the proposed framework, which is grounded in rigorous mathematical modeling, algorithmic innovation, and domain-specific reasoning. In the following sections, we articulate the methodology across three complementary components, each of which addresses a critical stage in the analytical process. The overall structure is designed to support a seamless transition from theoretical abstraction to practical implementation, thereby enhancing both interpretability and extensibility of the proposed pipeline. Section 3.2 lays the groundwork by introducing the essential formalism required for modeling microstructural data, and we establish the notation and mathematical foundations required to express microstructure fields, characterize their variability, and formulate the analytical objectives. These preliminaries include the symbolic encoding of spatial domains, the statistical representation of morphological features, and the formal expression of symmetry and invariance conditions. We also outline the high-level problem setting, emphasizing the role of probabilistic descriptors, topological constraints, and the challenges associated with high-dimensional microstructural manifolds. Section 3.3 introduces our core contribution—a novel generative mechanism tailored for microstructural representation learning, which we refer to as MorphoTensor. Unlike existing approaches that treat microstructure either as deterministic fields or fixed-resolution images, MorphoTensor incorporates hierarchical tensorial embeddings to preserve directional, scale-sensitive, and spatially localized information. This representation enables fine control over the expressivity and regularity of the model and accommodates domain priors such as anisotropy and periodicity. We also integrate latent Gaussian processes into the architecture to capture the uncertainty and multi-modality, ensuring robustness under incomplete or noisy observations. In Section 3.4, we introduce a complementary strategy we term Topology-Aware Augmented Encoding, which governs how microstructures are processed, interpreted, and regularized during learning. This strategy goes beyond conventional supervision or autoencoding schemes by embedding topological invariants—such as Betti numbers and persistence diagrams—into the optimization loop via differentiable approximations. This coupling between topological reasoning and geometric encoding forms a feedback system wherein local morphological consistency and global topological stability co-evolve during training. We explore a data augmentation and sampling regime inspired by persistent homology, which aids in generating diverse yet structurally coherent microstructures for both training and downstream applications.
Each of the three aforementioned sections is designed to build upon the previous one, progressively refining the microstructural analysis from abstract symbolic encoding to structured representations and then to intelligent processing strategies. The integration of these components enables a unified and extensible analysis framework capable of handling a broad spectrum of microstructural modalities—including binary phase fields, grayscale reconstructions, orientation maps, and multiphase composites. Throughout this methodological exposition, we remain anchored to the physical and statistical realities of microstructural data. This includes adherence to periodic boundary conditions, accommodation of multiscale heterogeneities, and respect for the sparsity and redundancy that typify real-world microstructures. Our framework is implemented in a modular fashion, enabling easy extension to supervised learning, inverse design, and uncertainty quantification tasks. Furthermore, the proposed methods are compatible with both synthetic benchmark datasets and empirical datasets derived from electron microscopy, X-ray tomography, and phase-field simulations. The methodology section of this paper lays out a rigorous, principled, and interpretable framework for microstructural analysis, including a formal problem encoding of microstructure variability and spatial characteristics; a generative modeling framework with structural priors and hierarchical embeddings; and a topology-aware processing strategy that couples geometric representation with topological reasoning. These components coalesce to form a holistic analytical toolkit, enabling robust learning and meaningful interpretation of complex material microstructures.
3.2 Preliminaries
Let
We define the space of admissible microstructures Equation 1 as
where
To characterize microstructure variability, we consider a probability space
Let
We define a metric
Given a dataset
where
Structural constraints are enforced through functionals
including invariants like volume fraction or periodicity.
Table 1 provides an explicit mapping between the abstract constraint functionals
3.3 MorphoTensor
To effectively model microstructural variability under geometric, physical, and topological constraints, we propose a novel generative model termed MorphoTensor. This model integrates hierarchical tensor representations with stochastic latent encoding, enabling expressivity over multiscale spatial patterns while respecting the underlying microstructural physics (As shown in Figure 1).
Figure 1. Schematic diagram of the MorphoTensor architecture. The framework consists of three modules: (1) a Hierarchical Tensor-Based Generator that synthesizes physically plausible microstructure images using multiscale tensorial convolutions; (2) a Latent Spatial Warping Mechanism that introduces deformation fields to capture heterogeneity and anisotropy in microstructures; (3) a Differentiable Regularization module that enforces structural consistency and aligns latent representations with geometric and statistical priors.
3.3.1 Hierarchical Tensor-Based Generator
Let
Figure 2. Schematic diagram of the Hierarchical Tensor-Based Generator framework. Microstructure inputs are encoded into tensor representations, followed by multiscale convolutions, attention mapping, and latent decoding. Spectral interpolation and residual connections are applied to maintain frequency-domain consistency and enhance feature expressiveness. The diagram highlights the flow of structural variables across different processing stages, supporting fine-grained feature learning and robust microstructure classification.
To capture multiscale and anisotropic textures, the generator is constructed using a hierarchy of tensor-valued convolutional layers, spectral interpolation modules, and directional filter banks.
The architecture is defined recursively as a depth-
where
Each convolutional layer
where
To preserve frequency-domain consistency across scales, we use spectral upsampling
where
To ensure that the convolutional responses reflect localized, structured phenomena such as grains, inclusions, or fibers, we enforce local energy normalization on the output of each filter bank Equation 8.
where
To maintain structural diversity across output samples, we apply instance-wise modulation to the filter responses via learned affine coefficients
which allows the generator to adjust local contrast and bias according to latent-conditioned semantics.
3.3.2 Latent Spatial Warping Mechanism
We enhance the generative capacity of the model by introducing a latent-driven spatial warping mechanism, where a coordinate deformation field modulates the geometry of the decoded microstructure. This warping function
where
The warped field is computed through a pullback operation, where the decoded unwarped field
This operation re-parameterizes the spatial layout of the field and allows the generator to model nonstationary features such as gradients, interfaces, and geometric anisotropy that cannot be captured by stationary convolutions alone.
To ensure invertibility and smoothness of the transformation, the deformation field is regularized using a Jacobian-based penalty. Let
where
We penalize extreme local distortion by enforcing a regularization on the Frobenius norm of the Jacobian deviation from identity Equation 13.
which discourages excessive stretching or folding of the coordinate map and ensures physical plausibility of the warped domain.
To preserve volume and avoid folding, we introduce a determinant-based regularizer that promotes diffeomorphic mappings Equation 14.
The total output field
3.3.3 Differentiable Topological and Manifold Regularization
To ensure topological characteristics of generated microstructures are preserved, we embed a differentiable approximation of topological invariants into the training objective. In particular, we focus on Betti numbers
where
As direct gradients through topological features are not tractable, we adopt a differentiable proxy using persistent homology. Let
where
To further regularize the geometry of the latent space, we induce a Riemannian metric
which captures the sensitivity of the generated microstructure to changes in each latent direction and encodes the intrinsic geometry of the generative map.
To discourage excessive curvature in the latent space, which may lead to poorly generalizable representations, we introduce a curvature regularizer based on the Frobenius norm of the second-order derivatives Equation 18.
To encourage smooth topological variation across samples, we introduce a pairwise consistency loss over mini-batches. Let
which penalizes abrupt topological changes and aligns the learned manifold with the topology-aware data structure.
3.4 Topology-Aware Latent Refinement
To effectively train the proposed MorphoTensor model for microstructural generation, we develop a novel strategy termed Topology-Aware Latent Refinement (TLR). This strategy leverages both data-driven loss formulation and structure-aware regularization to achieve physically consistent, statistically expressive, and topologically faithful microstructure synthesis. The TLR approach integrates multiscale supervision, adaptive perturbation schemes, and homology-aligned optimization (as shown in Figure 3).
Figure 3. Schematic diagram of the Latent Refinement module. The module integrates three components: (1) Multiscale Variational Learning, which captures both global and local structural patterns; (2) Constraint Projection and Augmentation, which incorporates physical priors and adaptive sampling; (3) Latent Decoding with topological and structural regularization. Together, these components refine latent representations to ensure accurate reconstruction, improved generalization, and physically consistent microstructure synthesis.
3.4.1 Multiscale Variational Learning
Given a dataset of microstructures
Figure 4. Schematic diagram of the Multiscale Variational Learning. The figure contains three interconnected modules including Global Patterns Expert, which leverages Mamba-style stacked linear, convolutional, and state-space modeling (SSM) blocks to capture coarse global trends, followed by a feedforward layer; Long-Short Router with Multi-Scale Patcher, where long- and short-range time series (TS) are routed to low- and high-resolution learning paths with a probability split of
Each input
where
The overall training objective follows the variational autoencoder (VAE) framework and is designed to balance accurate reconstruction with latent regularity. The total VAE loss is written Equation 21.
where
To measure reconstruction quality, we adopt a scale-adaptive loss defined over a multiresolution decomposition of the microstructure. Let
where
To further improve the expressiveness of the latent space, we inject structured noise into
which allows gradients to propagate through the stochastic sampling process during training.
Moreover, to prevent posterior collapse and enhance diversity, we regularize the mutual information between
where
The total loss incorporates all terms with tunable weights
This objective enables learning of compact, expressive, and scale-aware latent representations tailored for microstructural variability.
4 Experimental setup
4.1 Dataset
For completeness, we also consider results on generic large-scale image datasets (HEDM Muralikrishnan et al., 2023) and (HTEM Steingrimsson et al., 2023) as pretraining/transfer baselines; these details are provided in the Appendix. WELD SEAM Dataset (Zhao et al., 2024) is a fine-grained image classification dataset comprising 8,189 images of weld seams collected from various manufacturing environments. It spans 102 categories, each representing different types or conditions of weld seams. Each category contains between 40 and 258 images characterized by large variations in scale, pose, illumination, and surface finish. The dataset focuses on challenging intra-class similarity and inter-class variation as many weld seams share visual features. The images were obtained through industrial inspection systems and labeled using a combination of automated tools and expert manual verification. It has been widely used for evaluating fine-grained classification algorithms and defect detection models. The high-resolution imagery supports detailed texture and surface feature extraction, crucial for distinguishing subtle differences in weld quality. The MID Dataset (Jackson et al., 2022) contains 5,640 texture images organized into 47 categories based on human-centric attributes such as striped, dotted, fibrous, and bumpy. It emphasizes perceptual texture properties rather than object identities. Each category includes 120 images collected from diverse natural and artificial sources. The dataset supports research in texture recognition, segmentation, and attribute-based representation learning. All images are annotated according to describable attributes defined by human perception rather than material composition or object context. This makes MID suitable for studying mid-level visual attributes and for training models that interpret abstract semantic properties. The dataset challenges models to generalize texture recognition across variations in scale, illumination, and viewpoint.
In this work, EBSD orientation maps and synthetic phase-field simulations are used as the primary datasets for evaluating classification performance and topological consistency.
4.2 Experimental details
All experiments were conducted using the PyTorch framework on a server equipped with NVIDIA A100 GPUs (80 GB memory, CUDA 12.1). Mixed-precision training was adopted to accelerate convergence and reduce memory usage. For the HEDM and HTEM datasets, all images were resized to
This operator ensures that the generated microstructure maintains a target volume fraction by re-normalizing pixel intensities. The process is differentiable and integrated into the training loop, allowing physics-aware backpropagation without introducing hard constraints in Algorithm 1.
All reported results are averaged over three independent runs. 95% confidence intervals are computed using the Student’s t-distribution with 2 degrees of freedom. Statistical significance tests (two-tailed t-tests) are performed where appropriate.
For multi-class AUC computation, we adopt a macro-averaging strategy using the one-vs-rest approach, which calculates the AUC independently for each class and then takes the unweighted mean. This method is particularly suitable for imbalanced class distributions, such as those in the EBSD and phase-field datasets.
The following training configurations apply specifically to microstructural datasets (EBSD and phase-field). For these datasets, input images are resized to 256
All structural metrics are computed on normalized microstructure fields: 2-point correlation values are measured on images scaled to the range [0,1], while length-based descriptors (e.g., chord length and phase size) are expressed as fractions of the total image width.
4.3 Benchmarking against leading methods
Results on general-purpose datasets (HEDM and HTEM) are used only as generic pretraining and transfer baselines. Our primary evaluations focus on EBSD and phase-field datasets, which directly reflect the microstructural nature of the problem.
To address domain-specific validation, we incorporated two microstructure-centered datasets: (i) EBSD orientation maps from Ti-Al alloy specimens in Table 2 and (ii) synthetic phase-field simulations using the Cahn-Hilliard equation in Table 3. The following tables report classification metrics and topology-aware evaluation on these datasets.
Table 2. Classification performance on EBSD and Phase-field datasets with 95% confidence intervals. Values are reported as mean
Table 3. Microstructure-aware metric comparison on generated vs. real samples (95% confidence intervals). Values are reported as mean
Table 4 presents the results of our sensitivity analysis. The performance peak occurs at
Table 4. Sensitivity analysis of
Table 5 shows that our model preserves both semantic accuracy and topological structure under significant shifts in porosity and grain morphology. Unlike CNNs or VAEs, MorphoTensor maintains low persistence diagram distance and Euler number error, confirming its robustness in generalizing to unseen structural regimes.
Table 5. Robustness evaluation on out-of-distribution (OOD) morphologies with 95% confidence intervals. Values are reported as mean
All structural metrics are computed on normalized microstructure fields. 2-point correlation functions are evaluated over images scaled to [0, 1], and length-based descriptors (e.g., chord length and phase size) are expressed as fractions of the total image width. These conventions ensure comparability across datasets of different spatial resolutions.
4.4 Feature removal study
Results on general datasets (HEDM and HTEM) provide a better understanding of the model's baseline performance in large-scale environments.
Table 6 demonstrates the effects of different components of the MorphoTensor generative pipeline on the classification performance. The inclusion of synthetic microstructures generated under topological and physical constraints consistently improves the accuracy, recall, and AUC. Ablating topology loss or latent warping results in a degradation of performance, indicating that these modules are critical for preserving structural consistency in generated samples and enhancing classifier robustness.
Table 6. Ablation study on the impact of MorphoTensor pipeline components in classification tasks (EBSD dataset).
Table 7 presents the computational overhead introduced by the persistent homology (PH) loss during training on the EBSD dataset. Compared to the baseline configuration without PH losses, the inclusion of topology-aware regularization increases the per-epoch training time by approximately 28% and incurs a modest memory overhead of 319 MB. On a per-batch basis, the training time increases from 98.3 m to 126.0 m. Despite this increase in cost, the performance gains in topological fidelity and microstructure realism (as discussed below) justify the additional computation, especially in high-stakes applications involving materials informatics or microstructure-sensitive design tasks.
Table 8 compares the performance of three configurations on a publicly available EBSD dataset using both classification metrics and microstructure-aware topology metrics. While the CNN baseline achieves acceptable accuracy, it performs poorly on structural metrics such as two-point correlation error, chord length KL divergence (measured as a fraction of image width), and persistence diagram distance. Incorporating the MorphoTensor model without topology loss improves both accuracy and structure preservation. However, the full model with persistent homology loss achieves the best results across all metrics, indicating that topological regularization significantly enhances the geometric and physical plausibility of the generated or processed microstructures. This validates the core hypothesis of our work—that topology-aware generative modeling leads to structurally faithful representations in computational materials science.
For full reproducibility, we provide an anonymous code repository that includes training scripts, model configurations, and instructions for dataset preparation: https://snippets.cacher.io/snippet/0a9c95f0fa961047e7cd. The repository covers dataset-specific preprocessing (e.g., EBSD map cleaning and simulation of phase-field morphologies), hardware-agnostic training pipelines, and logging templates. This enables independent validation of all reported results. Upon final publication, this repository will be made publicly available under an open-source license.
5 Conclusions and future work
This work aims to overcome key shortcomings of conventional deep learning methods when applied to microstructural image analysis in the field of computational materials science. Existing methods often struggle with preserving the complex topology and spatial heterogeneity that characterize real-world material microstructures. To overcome this, we developed a novel deep learning framework built around the structured generative model MorphoTensor, which integrates physical, geometric, and topological priors into the learning process. This model introduces hierarchical tensorial embeddings that capture crucial characteristics such as directionality, anisotropy, and spatial locality. We proposed a Topology-Aware Latent Refinement (TALR) strategy, which leverages persistent homology and differentiable Betti numbers to ensure topological fidelity. This comprehensive design allows the model to unify statistical encoding, topological analysis, and latent space alignment. Our experiments, conducted across both synthetic and real microscopy datasets, confirm substantial improvements in classification accuracy, robustness, and generalization compared to conventional convolutional networks and autoencoders.
While the results are promising, there remain some limitations. First, the model’s reliance on complex topological tools such as persistent homology introduces computational overhead, which could hinder scalability in real-time or high-throughput applications. Second, although our model generalizes well across datasets, its performance on entirely unseen microstructural morphologies especially those outside the training distribution—still warrants further investigation. Looking ahead, future research should aim to enhance the computational efficiency of the TALR module and expand the framework to accommodate 3D volumetric datasets. Integrating active learning and physics-based simulation feedback loops also presents a compelling direction for enhancing the adaptability and physical validity of learned representations.
While our current framework operates on 2D microstructural images, the core components—tensorized encoding, spatial warping, topological losses, and physical constraints—generalize naturally to 3D volumetric data. Extending MorphoTensor to 3D would involve using 3D convolutional operators in the encoder and decoder, volumetric warping fields regularized via 3D Jacobian determinants, and persistent homology computed over voxelized inputs. Notably, efficient algorithms for computing persistent diagrams in 3D exist and can be integrated into the current training pipeline. This direction is particularly promising for applications involving X-ray tomography, 3D EBSD, or synthetic phase-field volumes and represents a key avenue for future work.
Beyond the architectural and learning-based contributions, our framework offers tangible benefits for downstream materials science workflows. For instance, the accurate preservation of chord-length distributions directly supports permeability estimation in porous materials and fatigue life modeling in polycrystalline alloys, where feature spacing and persistence affect crack initiation. Similarly, the ability to reproduce orientation distributions from EBSD-like inputs enhances the predictive modeling of anisotropic mechanical properties, such as elastic modulus and thermal expansion. By maintaining topological consistency and structural diversity, our method strengthens microstructure–property linkages critical to alloy design loops, defect screening, and performance certification in computational materials pipelines. These connections highlight the broader utility of topology-aware generative modeling in practical materials informatics tasks.
While additional experiments on HEDM and HTEM datasets are included in the Appendix as transfer baselines, the main evidence of our method is provided by EBSD and phase-field datasets, which are most representative of microstructural analysis.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material. The code supporting the findings of this study is available via Cacher: https://snippets.cacher.io/snippet/0a9c95f0fa961047e7cd. Further inquiries can be directed to the corresponding author.
Author contributions
HL: Conceptualization, Methodology, Software, Validation, Writing – original draft. PZ: Formal analysis, Investigation, Data curation, Writing – original draft. CT: Writing – original draft, writing – review and editing, Visualization, Supervision, Funding acquisition.
Funding
The authors declare that financial support was received for the research and/or publication of this article. Research Projects of the 14th Five-Year Plan for Educational Science under the Hebei Provincial Department of Education: Research on the Construction of Industry-Education Integration Community in the Field of Aerospace Cybersecurity Testing (No. 2503105).
Conflict of interest
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Alotaibi, J. G., Eid Alajmi, A., Kadirgama, K., Samylingam, L., Aslfattahi, N., Kok, C. K., et al. (2025). Enhancing engine oil performance with graphene-cellulose nanoparticles: insights into thermophysical properties and tribological behavior. Front. Mater. 12, 1549117. doi:10.3389/fmats.2025.1549117
Ashtiani, F., Geers, A. J., and Aflatouni, F. (2021). An on-chip photonic deep neural network for image classification. Nature 606, 501–506. doi:10.1038/s41586-022-04714-0
Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., et al. (2021). “Big self-supervised models advance medical image classification,” in IEEE international conference on computer vision.
Bazi, Y., Bashmal, L., Rahhal, M. M. A., Dayil, R. A., and Ajlan, N. A. (2021). Vision transformers for remote sensing image classification. Remote Sens. 13, 516. doi:10.3390/rs13030516
Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., and Veit, A. (2021). “Understanding robustness of transformers for image classification,” in IEEE international conference on computer vision.
Bostanabad, R., Zhang, Y., Li, X., Kearney, T., Brinson, L. C., Apley, D. W., et al. (2018). Computational microstructure characterization and reconstruction: review of the state-of-the-art techniques. Prog. Mater. Sci. 95, 1–41. doi:10.1016/j.pmatsci.2018.01.005
Chen, C.-F., Fan, Q., and Panda, R. (2021a). “Crossvit: Cross-attention multi-scale vision transformer for image classification,” in IEEE international conference on computer vision.
Chen, L., Li, S., Bai, Q., Yang, J., Jiang, S., and Miao, Y. (2021b). Review of image classification algorithms based on convolutional neural networks. Remote Sens. 13, 4712. doi:10.3390/rs13224712
Dai, Y., Gao, Y., and Liu, F. (2021). Transmed: transformers advance multi-modal medical image classification. Diagnostics 11, 1384. doi:10.3390/diagnostics11081384
DeCost, B. L., and Holm, E. A. (2015). A computer vision approach for automated analysis and classification of microstructural image data. Comput. Mater. Sci. 110, 126–133. doi:10.1016/j.commatsci.2015.08.011
Dong, H., Zhang, L., and Zou, B. (2022a). Exploring vision transformers for polarimetric sar image classification. IEEE Trans. Geoscience Remote Sens. 60, 1–15. doi:10.1109/tgrs.2021.3137383
Dong, X., Bao, J., Zhang, T., Chen, D., Gu, S., Zhang, W., et al. (2022b). Clip itself is a strong fine-tuner: achieving 85.7% and 88.0% top-1 accuracy with vit-b and vit-l on imagenet. arXiv Prepr. arXiv:2212.06138. Available online at: https://arxiv.org/abs/2212.06138.
Elpeltagy, M., and Sallam, H. (2021). Automatic prediction of covid- 19 from chest images using modified resnet50. Multimedia tools Appl. 80, 26451–26463. doi:10.1007/s11042-021-10783-6
Feng, J., Tan, H., Li, W., and Xie, M. (2022). “Conv2next: reconsidering conv next network design for image recognition,” in 2022 international conference on computers and artificial intelligence technologies (CAIT) (IEEE), 53–60. Available online at: https://ieeexplore.ieee.org/abstract/document/10072172.
Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., and Chanussot, J. (2020). Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geoscience Remote Sens. 59, 5966–5978. doi:10.1109/tgrs.2020.3015157
Hong, D., Han, Z., Yao, J., Gao, L., Zhang, B., Plaza, A., et al. (2021). Spectralformer: rethinking hyperspectral image classification with transformers. IEEE Trans. Geoscience Remote Sens. 60, 1–15. doi:10.1109/tgrs.2021.3130716
Jackson, J., Owsiak, A. P., Goertz, G., and Diehl, P. F. (2022). Getting to the root of the issue (s): expanding the study of issues in mids (the mid-issue dataset, version 1.0). J. Confl. Resolut. 66, 1514–1542. doi:10.1177/00220027221080967
Kalidindi, S. R., and De Graef, M. (2015). Materials data science: current status and future outlook. Annu. Rev. Mater. Res. 45, 171–193. doi:10.1146/annurev-matsci-070214-020844
Kim, H. E., Cosa-Linan, A., Santhanam, N., Jannesari, M., Maros, M., and Ganslandt, T. (2022). Transfer learning for medical image classification: a literature review. BMC Medical Imaging.Available online at: https://link.springer.com/article/10.1186/s12880-022-00793-7.
Koonce, B. (2021a). “Efficientnet,” in Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization (Springer), 109–123. Available online at: https://link.springer.com/book/10.1007/978-1-4842-6168-2.
Koonce, B. (2021b). “Mobilenetv3,” in In Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization. Springer, 125–144. Available online at: https://link.springer.com/book/10.1007/978-1-4842-6168-2.
Li, B., Li, Y., and Eliceiri, K. (2020). “Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning,” in Computer vision and pattern recognition.
Liu, G., and Huang, H. (2025). Research progress of amorphous micro-nano structured materials. Front. Mater. 12, 1589830. doi:10.3389/fmats.2025.1589830
Mai, Z., Li, R., Jeong, J., Quispe, D., Kim, H. J., and Sanner, S. (2021). Online continual learning in image classification: an empirical survey. Neurocomputing 469, 28–51. doi:10.1016/j.neucom.2021.10.021
Masana, M., Liu, X., Twardowski, B., Menta, M., Bagdanov, A. D., and van de Weijer, J. (2020). Class-incremental learning: survey and performance evaluation on image classification. IEEE Trans. Pattern Analysis Mach. Intell. 45, 5513–5533. doi:10.1109/tpami.2022.3213473
Mascarenhas, S., and Agarwal, M. (2021). “A comparison between vgg16, vgg19 and resnet50 architecture frameworks for image classification,” in 2021 international conference on disruptive technologies for multi-disciplinary research and applications (CENTCON).
Maurício, J., Domingues, I., and Bernardino, J. (2023). Comparing vision transformers and convolutional neural networks for image classification: a literature review. Appl. Sci. Available online at: https://www.mdpi.com/2076-3417/13/9/5521.
Muralikrishnan, V., Liu, H., Yang, L., Conry, B., Marvel, C. J., Harmer, M. P., et al. (2023). Observations of unexpected grain boundary migration in srtio3. Scr. Mater. 222, 115055. doi:10.1016/j.scriptamat.2022.115055
Peng, J., Huang, Y., Sun, W., Chen, N., Ning, Y., and Du, Q. (2022). Domain adaptation in remote sensing image classification: a survey. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 15, 9842–9859. doi:10.1109/jstars.2022.3220875
Rao, Y., Zhao, W., Zhu, Z., Lu, J., and Zhou, J. (2021). Global filter networks for image classification. Neural Inf. Process. Syst. Available online at: https://proceedings.neurips.cc/paper/2021/hash/07e87c2f4fc7f7c96116d8e2a92790f5-Abstract.html.
Rezaei, S., Asl, R. N., Taghikhani, K., Moeineddin, A., Kaliske, M., and Apel, M. (2024a). Finite operator learning: bridging neural operators and numerical methods for efficient parametric solution and optimization of pdes. arXiv Prepr. arXiv:2407.04157. Available online at: https://arxiv.org/abs/2407.04157.
Rezaei, S., Moeineddin, A., and Harandi, A. (2024b). Learning solutions of thermodynamics-based nonlinear constitutive material models using physics-informed neural networks. Comput. Mech. 74, 333–366. doi:10.1007/s00466-023-02435-3
Rezaei, S., Asl, R. N., Faroughi, S., Asgharzadeh, M., Harandi, A., Koopas, R. N., et al. (2025). A finite operator learning technique for mapping the elastic properties of microstructures to their mechanical deformations. Int. J. Numer. Methods Eng. 126, e7637. doi:10.1002/nme.7637
Roy, S. K., Deria, A., Hong, D., Rasti, B., Plaza, A., and Chanussot, J. (2022). Multimodal fusion transformer for remote sensing image classification. IEEE Trans. Geoscience Remote Sens. 61, 1–20. doi:10.1109/tgrs.2023.3286826
Ru, J., Fang, Y., Guo, Y., Jiang, L., Liu, H., Li, X., et al. (2025). Rheology, permeability and microstructure of seawater-based slurry for slurry shield tunneling: insights from laboratory tests. Front. Mater. 12, 1592537. doi:10.3389/fmats.2025.1592537
Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., and Homayouni, S. (2020). Support vector machine versus random forest for remote sensing image classification: a meta-analysis and systematic review. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 13, 6308–6325. doi:10.1109/jstars.2020.3026724
Steingrimsson, B., Agrawal, A., Fan, X., Kulkarni, A., Thoma, D., and Liaw, P. (2023). Construction of multi-dimensional functions for optimization of additive-manufacturing process parameters. arXiv Prepr. arXiv:2311.06398. Available online at: https://arxiv.org/abs/2311.06398
Sun, L., Zhao, G., Zheng, Y., and Wu, Z. (2022). Spectral–spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans. Geoscience Remote Sens. 60, 1–14. doi:10.1109/tgrs.2022.3144158
Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., and Schmidt, L. (2020). Measuring robustness to natural distribution shifts in image classification. Neural Inf. Process. Syst. Available online at: https://proceedings.neurips.cc/paper/2020/hash/d8330f857a17c53d217014ee776bfd50-Abstract.html.
Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J., and Isola, P. (2020). “Rethinking few-shot image classification: a good embedding is all you need?,” in European conference on computer vision.
Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., et al. (2021). Resmlp: feedforward networks for image classification with data-efficient training. IEEE Trans. Pattern Analysis Mach. Intell. 45, 5314–5321. doi:10.1109/tpami.2022.3206148
Touvron, H., Cord, M., and Jégou, H. (2022). “Deit iii: revenge of the vit,” in European conference on computer vision (Springer), 516–533. doi:10.1007/978-3-031-20053-3_30
Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., et al. (2022). Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559. doi:10.1016/j.media.2022.102559
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., et al. (2021). Medmnist v2 - a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Sci. Data 10, 41. doi:10.1038/s41597-022-01721-8
Zhang, C., Cai, Y., Lin, G., and Shen, C. (2020). Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. Comput. Vis. Pattern Recognit., 12200–12210. doi:10.1109/cvpr42600.2020.01222
Zhang, Y., Li, W., Sun, W., Tao, R., and Du, Q. (2022). Single-source domain expansion network for cross-scene hyperspectral image classification. IEEE Trans. Image Process. 32, 1498–1512. doi:10.1109/tip.2023.3243853
Zhao, Y., Li, Z., Wang, Z., and Chen, Y. (2024). Enhancing weld seam recognition in industrial robotics through advanced deep learning techniques. 17th Int. Sci. Pract. Conf. “The latest Technol. Dev. Sci. Bus. Education”(April 30–May 03, 2024) Lond. Great Britain. Int. Sci. Group 2024, 446. Available online at: https://books.google.com.hk/books?hl=zh-CN&lr=&id=OTEZEQAAQBAJ&oi=fnd&pg=PA390&dq=Enhancing+weld+seam+recognition++in&#+;industrial+robotics+through+advanced+deep+learning+techniques&ots=Szgf9H1JJW&sig=DrnIpUqaVZeYFotbqutg9qCHPQI&redir_esc=y#v=onepage&q=Enhancing%20weld%20seam%20recognition%20%20in%20industrial%20robotics%20through%20advanced%20deep%20learning%20techniques&f=false.
Zheng, X., Sun, H., Lu, X., and Xie, W. (2022). Rotation-invariant attention network for hyperspectral image classification. IEEE Trans. Image Process. 31, 4251–4265. doi:10.1109/tip.2022.3177322
Keywords: microstructural analysis, deep generative models, topological learning, computational materials science, MorphoTensor
Citation: Liu H, Zhu P and Tan C (2026) Deep learning-based image classification for microstructural analysis in computational materials science. Front. Mater. 12:1648653. doi: 10.3389/fmats.2025.1648653
Received: 17 June 2025; Accepted: 27 October 2025;
Published: 20 January 2026.
Edited by:
Shahed Rezaei, Access e.V., GermanyReviewed by:
Alexandre Viardin, Access e.V., GermanyAleena Baby, Access e.V., Germany
Nayan Kumar Sarkar, IILM University, India
Copyright © 2026 Liu, Zhu and Tan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Penghua Zhu, aGF6ZXJhcm9naTBAaG90bWFpbC5jb20=
Haiyan Liu1