Your new experience awaits. Try the new design now and help us make it even better

REVIEW article

Front. Mater., 06 January 2026

Sec. Computational Materials Science

Volume 12 - 2025 | https://doi.org/10.3389/fmats.2025.1669229

Data-driven AI approaches for screening high-efficiency, stable, and lead-free perovskite photovoltaic materials: a review

Beibei WangBeibei Wang1Juan Wang
Juan Wang1*Liping LiLiping Li2Dengwu WangDengwu Wang1
  • 1Xi’an Key Laboratory of Advanced Photo-Electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing University, Xi’an, China
  • 2Beijing Chaoyang Foreign Language School, Junior High School Physics Group, Beijing, China

With the global increase in energy demand and environmental awareness, it has become crucial to develop new types of energy materials that are efficient, stable and environmentally friendly. Lead-free perovskite materials have garnered attention due to their unique crystal structure (ABX3) and photoelectric properties, particularly demonstrating great potential for applications such as photovoltaics, photodetectors, catalysis, and display lighting. However, the lead toxicity of traditional lead-containing perovskite materials limits their large-scale commercialization. Therefore, the research on stable and non-toxic lead-free perovskite materials has become a current hot topic in scientific research. In recent years, artificial intelligence technology has brought about a transformation in the study of perovskite materials. This review focuses on the application of AI in lead-free perovskite research, including data collection, preprocessing, feature extraction, model training and prediction, reverse design and experimental verification. This paper aims to leverage AI technologies to drive data-informed and inverse-designed discovery processes, thereby improving the efficiency and success rate of lead-free perovskite materials screening, development, and performance optimization.

1 Introduction

1.1 Background of the study

Global energy demands and environmental concerns drive the need for efficient, stable, and eco-friendly energy materials. Perovskites (ABX3, where A = organic/inorganic cation, B = metal cation, X = halogen anion) exhibit exceptional optoelectronic properties—high absorption, carrier mobility, tunable bandgap—making them promising for diverse applications (Figure 1). In photovoltaics, they offer high efficiency and low-cost processing (Zhang et al., 2025). For catalysis, they enable environmental purification and chemical synthesis (Singh et al., 2025; Irshad et al., 2022; Yoshida et al., 2014). PeLEDs show potential in displays/lighting due to high luminescence and colour purity (Chen et al., 2024a; Liu Y. et al., 2025), while perovskite photodetectors excel in optical communication and imaging (Liang et al., 2025; Ma Y. et al., 2025; Wang X. et al., 2024). Their optical gain supports laser applications (Shi et al., 2025; Guan et al., 2025), and high carrier mobility enables flexible transistors (Yu et al., 2018; Dong et al., 2021; Yang et al., 2024). Resistive/ferroelectric memories (Guo et al., 2023; Di et al., 2021), sensors (Pathak et al., 2025), actuators (Li et al., 2022), energy harvesters (Mangi et al., 2025), and X-ray detectors (via heavy-element compositions) (Jiang et al., 2022a) further highlight their versatility. However, the Pb2+ in traditional lead-based perovskites (such as FAPbI3 and MAPbI3) has significant biological toxicity and environmental accumulation. During their production, application, and disposal, they may leak into soil and water bodies, causing long-term harm to ecosystems (Ma X. T. et al., 2025). Furthermore, they may cause irreversible damage to the human nervous and hematopoietic systems (Leccisi and Fthenakis, 2024), which is a critical issue that seriously hinders commercialisation (Maietta et al., 2025). To address this, the researchers developed a multi-component lead-free alternative system. The structures and photovoltaic properties of the bismuth (Bi)-based derivatives (such as Cs3Bi2I9) are similar to those of lead-based substances, and are considered one of the most promising candidates. However, it should be noted that some compounds may have toxicity issues (Maietta et al., 2025; Lang et al., 2024). The tin (Sn) based system (such as FASnI3) has a bandgap close to the ideal value for photovoltaics, and it has lower toxicity (Liu T. H. et al., 2025). Sb-based derivatives (such as Cs3Sb2I9) exhibit excellent photostability and low defect density, making them important supplementary materials in the field of photoelectric detection (Tang et al., 2021). Researches on these lead-free systems provide important directions for the environmentally friendly applications of perovskite materials.

Figure 1
Circular infographic illustrating six applications of a central molecular structure: photodetectors, lasers, catalysts, PeLEDs, transistors, and solar cells. Each segment contains a relevant image or diagram.

Figure 1. Application areas of perovskite materials.

Traditional experimental methods for perovskite development are time-consuming and costly, while high-precision computational approaches suffer from high computational demands. Both struggle to efficiently explore perovskite chemical space, slowing research and development process. Despite the broad application prospects of perovskite materials, they currently face challenges such as a fragmented technological path, ambiguous predictive models, and low screening efficiency. This paper reviews the research progress to assist in technological breakthroughs. The structure is as follows: the first section highlights that the development of materials is constrained by lead toxicity and inefficient research and development, elucidating the role of artificial intelligence (AI) and its pathways; the second section introduces experimental databases of perovskite materials, high-throughput computational databases, as well as literature mining and multiscale computational techniques; sections three to five systematically outline feature extraction pathways and bottlenecks, detail machine learning predictive models and their applications, and elaborate on the cost reduction and efficiency improvement role of screening methods; the sixth section provides an overview of predictive research practices in seven major applications; the seventh section analyzes current model issues and forecasts technological advancements; the final section summarizes the transition of the role of AI, emphasizing interdisciplinary value and clarifying future directions.

Machine learning (ML) and high-throughput computing address these limitations by enabling rapid virtual screening and property prediction before experimentation (Alam and Prasad, 2025). AI integrates the Design-Build-Test-Learn (DBTL) cycle into a closed loop: it accelerates material design via predictive modeling, automates synthesis, streamlines characterization, and iteratively improves performance through feedback. For instance, Omidvar et al. (2024) used ML and automation to expedite perovskite discovery for wireless sensors, reducing errors. AI-driven high-throughput techniques thus accelerate both material discovery and industrial-scale development.

1.2 Significance of the study

Significant progress has been made in perovskite materials research (e.g., References (Mao and Xiang, 2025; Chen et al., 2023; Chen J. L. et al., 2022)). However, dedicated studies on lead-free perovskites remain insufficient. This study focuses exclusively on lead-free perovskites, delving into the compatibility between their material properties and machine learning methodologies. As shown in Figure 2, the system presents the typical workflow flow of material machine learning. It aims to address challenges such as multi-objective coupling and data scarcity, providing theoretical and practical guidance for the development of environmentally friendly perovskites. This research employs multiple methodologies including semi-supervised learning and transfer learning—to balance performance, cost, and stability requirements, thereby highlighting the necessity of this investigation.

Figure 2
Flowchart depicting a machine learning workflow in materials science, including data preprocessing with databases and scientific data, feature engineering with attributes like electronegativity, model validation using algorithms such as XGBoost and SVM, and model deployment featuring virtual screening and online prediction tools.

Figure 2. The general workflow in materials machine learning.

Artificial intelligence (AI) offers novel approaches for perovskite research and development: rapid prediction of material stability through data modelling and high-throughput screening (Zhao et al., 2021). Siad et al. (2024) confirmed the exceptional thermal stability of the K2NaInX6 double perovskite via DFT. Wu et al. (2024) achieved precise bandgap prediction by integrating experimental and DFT data through transfer learning. The combination of AI and DFT can also predict defect transition energies. Future integration with automated platforms will drive on-demand perovskite design, accelerating commercialisation in optoelectronics and photovoltaics.

In recent years, the potential of data-driven and model-driven fusion approaches has become increasingly evident. The framework developed by Cheng et al. (2025), integrating feature engineering with active learning, enhances the accuracy of performance prediction. This advances research and development from a trial-and-error approach towards rational design, accelerating material discovery while offering novel pathways for elucidating structure-function relationships.

1.3 Outline the technology pathway for AI in perovskite research

In the field of perovskite research, the application of AI technology encompasses the entire process, including data collection, preprocessing, feature selection and extraction, model training and prediction, inverse design, experimental validation, and feedback. The AI-driven perovskite development process integrates multi-source data and extracts key electronic/crystal features through statistical analysis. ML models then correlate these features with performance metrics, optimized via cross-validation. For inverse design, genetic algorithms (Yang J. et al., 2023) and generative AI create novel crystal structures meeting target specifications. Automated high-throughput experiments validate AI-designed materials, with experimental feedback continuously improving models, forming a closed-loop research and development system that accelerates discovery from data to deployment.

2 Perovskite materials database

In order to solve the problem of data dispersion in the research of perovskite materials, this section introduces experimental databases and high-throughput computing databases, and combines literature mining techniques with multiscale computational methods to provide key technical support for the performance prediction of such materials and the development of new materials.

2.1 Experimental database

Building a comprehensive and efficient perovskite materials database is crucial for materials research, property prediction, and application development. The database integrates experimental data, computational data, and literature information, providing researchers with rich resources to accelerate the discovery and optimisation of new materials. Table 1 lists commonly used databases for perovskite materials. The Inorganic Crystal Structure Database (ICSD) is the largest database of experimental inorganic crystal structures, with over 240,000 entries covering cell parameters, synthesis conditions, etc., useful for structural analysis, synthesis optimization, and theoretical validation. The Packet Crystal Photovoltaic Materials Database (PCPMD) compiles over 20,000 data points on packet-crystal solar cells for material screening and comparison. The Inclusion Database, developed by Springer Materials and Duke University, provides detailed information on organic-inorganic hybrid inclusion photomaterials, aiding material property analysis and new material development. The Materials Platform for Data Science (MPDS), featuring the PAULING FILE database, aggregates data on over one million inorganic materials, offers search and analysis tools, and leverages machine learning and ab initio simulation for data-driven materials design.

Table 1
www.frontiersin.org

Table 1. Database of commonly used materials.

2.2 High-throughput computational database

Key materials databases support perovskite research: (1) The Materials Genome Project provides experimental and high-throughput computational data including crystal/band structures; (2) Automatic Flow for Materials Discovery (AFLOW) automates large-scale calculations for structural optimization, elastic properties, and stability analysis; (3) The Technical University of Denmark’s Computational Materials Repository (open-source, Creative Commons licensed) stores electronic structure data spanning structural, electronic, elastic, thermodynamic, magnetic and optical properties, with web/Python interfaces for retrieval and analysis, supporting materials design, multiscale simulation and machine learning applications.

2.3 Literature mining and structured data

Perovskite research leverages extensive literature from databases (Web of Science, Scopus, CNKI) and preprint platforms. Effective retrieval requires strategic keyword selection, Boolean operators, and quality filters (journal impact factors, author credentials). Natural language processing (NLP) techniques enable automated extraction of research articles via web crawlers for text mining. Figure 3 outlines a complete machine learning workflow, covering literature review, text preprocessing, feature extraction, model training, deployment, and final result visualization and analysis. Zhang J. et al. (2024) developed an advanced entity recognition method combining MatBERT embeddings, Convolutional Neural Networks (CNN) feature extraction, and Conditional Random Fields (CRF) decoding, achieving 1%–6% performance gains. This approach successfully identified 2,389 key entities from perovskite literature, significantly enhancing knowledge extraction efficiency.

Figure 3
Flowchart depicting a process in six stages:

Figure 3. Natural language processing (NLP) technology flowchart.

2.4 Multi-scale computational methods for training data generation

To overcome the limitations of Density Functional Theory (DFT) in describing specific properties such as band gaps and optical absorption, multi-scale computational methods are increasingly being applied to generate diverse, high-quality training data for machine learning models. These methods integrate different theoretical levels, achieving a balance between accuracy and computational cost, and are optimised for specific property prediction tasks. Table 2 summarises the theoretical levels corresponding to different computational methods and their advantages in terms of machine learning training data.

Table 2
www.frontiersin.org

Table 2. Common computational methods in the field of materials calculation.

3 Feature extraction

Feature extraction is the key link between material properties and performance. Leveraging data science and AI, it has become central to accelerating material development. There are currently five major technical paths. This section outlines the core aspects and bottlenecks, providing references for perovskite research and development.

3.1 Feature extraction based on database and computational tools

Feature extraction from materials databases accelerates perovskite discovery by providing critical structural and electronic property data. ML tools and DFT calculations enable efficient extraction of key descriptors like lattice parameters and band structures (Hashimoto et al., 2025; Dean et al., 2021; Ward et al., 2018). Xue (Chen et al., 2024b) demonstrated this approach using 6,380 perovskites from Materials Project, generating 300 descriptors via Matminer for bandgap prediction. While enabling high-throughput analysis of large datasets, challenges include data quality control, computational costs, and the need for feature selection to prevent overfitting from excessive descriptors.

3.2 Statistical and machine learning oriented feature extraction

Statistical and machine learning-based feature extraction plays a vital role in materials science by identifying performance-critical descriptors and reducing dimensionality. Methods like correlation analysis and Recursive Feature Elimination (RFE) enable efficient feature selection. As shown in Figure 4, a high-performance classification model based on molecular features is proposed. This model is improved based on relevant references (Zhao and Wang, 2022) and is specifically designed for the precise classification of the deformability and stability of perovskite materials. In Figures 4a–f, the performance differences of models corresponding to different numbers of features (21, 16, and 17) are demonstrated in the confusion matrix, confirming that by removing redundant features, the model still maintains a low false positive rate (0-1). This indicates that feature dimensionality reduction can prevent overfitting and enhance the generalization ability of the model. Li et al. (2018) demonstrated this by predicting perovskite oxide stability using just 70 of 791 features. Similarly, Pilania et al. (2016) identified 16 key descriptors from 47 for double perovskite bandgap prediction using Lasso regression. While these approaches enhance model efficiency and interpretability, challenges remain: reduced features may lose physicochemical meaning, improper selection risks overfitting, and computational complexity increases with dataset size.

Figure 4
Various graphics illustrating data analysis, including confusion matrices with different feature counts, ROC curves demonstrating model performance, feature importance bar graphs, and a colorful heatmap showing numerical correlations between variables. The confusion matrices display true negatives, true positives, false negatives, and false positives. The ROC curves assess predictive accuracy. Feature importance is ranked by bar length. The heatmap visualizes correlations with a color scale from negative to positive.

Figure 4. Confusion matrixes (a–c) and ROC curves (d–f) for the perovskite formability prediction models trained with 21, 16, and 17 features, respectively. Feature importance of the 21 features in predicting the (g) formability and (h) stability of perovskites. (i) Pearson correlation coefficients between the 21 features. Reprinted with permission from ref (Zhao and Wang, 2022).

3.3 Feature extraction with chemical and physical properties as the basis

The extraction of perovskite features is centred on the chemical composition and is divided into three aspects: the elemental level collects atomic weights, radii, valence electrons, electronegativity, etc., to reveal the bonding and electronic structure; the structural level extracts the tolerance factors, bond lengths, bond angles, lattice constants, and octahedral aberrations to correlate the bandgap and carrier behaviours; and the thermodynamic level calculates enthalpies, entropies, free energies, and defect energies to assess the phase stability and synthesis feasibility. The three types of features collaborate to construct three-dimensional descriptors to support cross-scale performance prediction. Figures 4g,h displays the ranking of the importance of 21 features. This indicates that elemental-level features have a decisive impact on the stability of chemical bonds, while structural-level features affect the integrity of the crystal structure. Both types of features jointly influence the formation energy and stability of perovskite. Thoppil and Alankar (2022) evaluated the predictive capability and stability of features by collecting physical properties, thermodynamic parameters, and other relevant data. This method is comprehensive, has clear physical significance, and is easy to extend, providing a theoretical basis for the prediction and improvement of material characteristics. However, it also faces challenges, such as high computational costs and difficulties in obtaining certain property data.

3.4 Feature processing under clustering and dimensionality reduction techniques

Clustering and dimensionality reduction-based feature processing methods utilize cluster analysis and dimensionality reduction techniques to process feature data. Cluster analysis is used to group feature data by clustering algorithms to reduce multicollinearity among features and reveal implicit patterns. Meanwhile, Principal Component Analysis (PCA) and other dimensionality reduction techniques are employed to reduce the dimensionality of high-dimensional feature data, which significantly reduces feature dimensionality, thereby facilitating model visualization and improving training efficiency.

Bhattacharya and Roy (2023) downsampled the data using Local Feature Analysis (LFA) and PCA to analyze the underlying structures and features, which in turn revealed similarities and differences between different chalcocite materials. Jin et al. (2025) developed a Transformer-based ct-UAE method to predict the properties of perovskite materials. They used the UMAP algorithm to reduce the dimensionality, mapping high-dimensional atomic embeddings into a two-dimensional space, and then classified them into three categories using the K-means algorithm. The advantages of ct-UAE include significant dimensionality reduction, deeper pattern understanding, and improved model interpretability. However, challenges include interpreting clustering results, information loss, and parameter selection complexity.

3.5 Machine learning model-driven feature extraction

ML models are employed to extract features and automatically identify those that are most relevant to the performance of a target. This process is carried out using the internal mechanisms of Random Forest (RF) (Tang et al., 2020) and deep learning models, such as Convolutional Neural Networks (CNNs) (Khozeimeh et al., 2022). RF can analyze the importance of features and select the key ones that significantly influence the predicted target. In contrast, deep learning models can directly extract features from materials’ structural data to facilitate structural classification or performance prediction. Zhao et al. (2022) evaluated ML models for predicting halogenated perovskite stability and bandgaps, finding nonlinear integrated models achieved excellent accuracy while revealing structure-property relationships. Li X. et al. (2019) developed a CNN screening model using Magpie features, successfully identifying novel perovskites from 21,316 hypothetical structures. While offering advantages like automated feature extraction and improved accuracy, these approaches face challenges including model complexity, interpretability limitations, and overfitting risks with limited training data.

In summary, ML models can achieve efficient extraction of key features of perovskite materials through feature importance analysis or direct modelling of structured data, laying the foundation for subsequent performance predictions. Figure 4i displays the Pearson correlation matrix, indicating a weak correlation between the tolerance factor and octahedral distortion. Including these two structural features as descriptors helps avoid feature redundancy and ensures complementary information. To further clarify the relationship between the various feature engineering methods discussed above and the machine learning models, as well as the target predictive attributes, a clear framework of ‘features-models-applications’ is provided for the construction of subsequent predictive models. Table 3 systematically summarises the core feature engineering techniques, types of input features, compatible machine learning methods, and corresponding target attributes involved in data-driven research on lead-free perovskite materials. This will serve as an important reference for Section 4, which elaborates on the principles and applications of various predictive models.

Table 3
www.frontiersin.org

Table 3. Application of machine learning methods in material property-related tasks.

4 Predictive model of perovskite materials

Machine learning, through its data processing capabilities, offers effective methods for predicting performance and designing the structure of perovskite materials. Predictive models are categorised into four types: supervised learning, semi-supervised learning, unsupervised learning, and transfer learning. This chapter provides a detailed introduction to the principles, advantages, and applications of these models.

4.1 Supervised learning model

4.1.1 Regression model

As the most basic method in regression analysis, the linear regression model assumes a linear relationship between input features and the target variables, and uses Ordinary Least Squares (OLS) to minimize the sum of squared errors to fit the model (Fitrianto and Xin, 2022). This method is effective for predicting specific performance parameters of perovskite materials that are linearly correlated, such as density (Kusuma et al., 2025). Figure 5 shows the basic process of supervised machine learning: the dataset is divided into training and testing parts, and the algorithm builds a model using the training data and evaluates the model’s performance using the testing data.

Figure 5
Flowchart depicting a machine learning process. It starts with an

Figure 5. Supervised learning model technology in perovskite material prediction.

Ridge Regression (RR) is an optimization of linear regression that addresses overfitting when features are highly correlated with each other by adding an L2 regularisation term to the loss function (Hastie, 2020). The method enhances the model’s stability by penalizing the sum of squares of the model coefficients. When predicting the properties of perovskites, RR can effectively enhance the model’s generalization ability if the number of features is significant and multicollinearity exists. Li R. et al. (2021) predicted the thermodynamic stability, crystal volume and oxygen vacancy formation energy of perovskite materials by the RR method. The results show that the RR method performs best in predicting thermodynamic stability. Bayesian Ridge Regression (BRR) combines Bayesian statistics and ridge regression, introducing L2 regularisation (Hogg and Villar, 2021), to assess model parameter distributions within a Bayesian framework. It provides uncertainty estimates and is suitable for probabilistic analysis. Tao et al. (2021) used models such as Gradient Boosting Regression (GBR) and Random Forest (RF), with elements’ electronegativity and atomic radius as characteristics, to predict the band gaps of ABX3 type and double perovskite materials. They screened materials that met the requirements of solar cells (1.7–3.0eV), and some of the models had an R2 value of over 0.85. Yang et al. (2021) used the SVR model to predict perovskite bandgaps and screened out oxide double perovskites suitable for photovoltaic applications from a large number of virtual samples. Random Forest Regression (RFR) enhances prediction accuracy through ensemble decision trees, reducing overfitting risks via inherent randomness (Breiman, 2001). Its robustness with high-dimensional data makes it ideal for perovskite property prediction. Zhang J. et al. (2023) applied RFR to 1,306 double perovskite bandgaps, identifying bulk modulus, superconducting temperature, and cation electronegativity as key determinants. Similarly, Sharma et al. (2023) demonstrated RFR’s effectiveness in predicting formation energies and band gaps of sulfur-doped perovskites, successfully identifying optimal doped compounds for photovoltaic applications. The method excels in handling complex feature spaces while maintaining interpretability of critical performance drivers. Gradient Boosting Regression (GBR) is a learning method based on gradient boosting that improves prediction accuracy by gradually introducing new models. It optimises the model by minimising the loss function and fits a new model to the residuals in each iteration to enhance predictive power. GBR excels at handling complex datasets, non-linear relationships, and large-scale data (Friedman, 2001). Extreme Gradient Boosting (XGBoost) is an optimized gradient-boosting algorithm that enhances performance by introducing regularization terms and optimizing the tree structure. It performs well in large-scale data processing and automatic handling of missing values. Touati et al. (2024) classified the crystal structures of 381 halide and oxide perovskites using XGBoost, achieving an accuracy rate of 76.62%. Gaussian Process Regression (GPR) is a probabilistic model that assumes the relationships between data points (Henderson et al., 2023) based on Gaussian distribution, which is modeled through covariance calculations. The advantage of GPR lies in its ability to estimate and predict uncertainties while minimizing assumptions about data distribution. Artificial Neural Network (ANN) is a method that mimics the human brain by simulating the connection structure of human brain neurons. Processing complex nonlinear data through multi-layer networks shows strong adaptability (Katsikioti, 2024; Kumar and Sivamani, 2021). Rahman et al. (2025) combined high-throughput synthesis, high-resolution spectroscopy techniques and machine learning, not only efficiently synthesizing perovskites with multiple chemical compositions but also accurately predicting their chemical compositions using optical data. This research offers a novel method for screening and optimizing perovskite materials. The Lasso model is used for feature selection and regression analysis. By incorporating an L1 regularization term, it achieves sparse solutions, which reduces model complexity and enhances interpretability (Ueno et al., 2021). In materials science, it helps predict the thermoelectric properties of perovskite and supports the design and optimization of new materials.

For bandgap prediction, a property where DFT typically underestimates values by 0.5–1.0 eV, hybrid ML models integrating high-level computational data have shown significant improvements. Guo and Lin (2021) used four machine learning algorithms, including random forest and ridge regression, to build models. The results showed that the random forest model had the best predictive performance; this model can effectively capture the nonlinear relationship between the band gap, the highest occupied energy level (hoe_b1), and the cubic phase structure, providing support for the efficient development of perovskite photovoltaic materials. The optimised model reduced the band gap prediction error for lead-free perovskites (e.g., Cs2AgBiBr6) from 0.48 eV to 0.19 eV, matching experimental measurements. Li and Wang (2024) generated 1,200 TD-DFT optical absorption spectra for hybrid organic-inorganic perovskites (HOIPs) with different A-site cations. This dataset was used to train a CNN model for predicting absorption edge wavelengths, achieving a root-mean-square error (RMSE) of 5 nm—far better than models trained on DFT.

4.1.2 Classification model

Logistic regression is a type of generalized linear model mainly used in binary classification problems. It uses the Sigmoid function to map the output of linear regression to the (0,1) range for classification, where a probability value close to 1 indicates a positive class classification and a value close to 0 indicates a negative class classification. In the research of perovskite materials, the logistic regression model uses the material properties as input to predict the probability of the material having specific attributes, providing a preliminary screening tool for experimental research, and can predict the thermodynamic stability of the material (Zhu et al., 2024; Sun and Yin, 2017), and whether it has an ideal bandgap range.

RF is an ensemble learning method that constructs multiple decision trees and integrates their results. As shown in Figure 6, this is a schematic of the RF classification process. This method improves classification accuracy and stability by integrating multiple weak classifiers. It can be used to predict material properties and evaluate feature importance. Sudha Priyanga et al. (2022) used the random forest algorithm to predict the bandgap characteristics of perovskite oxides, considering multiple factors, and trained the model on a computationally generated dataset. The results showed that the algorithm achieved an accuracy rate of approximately 91%, outperforming other models.

Figure 6
Flowchart illustrating a decision-making process for dataset classification. The

Figure 6. Random Forest algorithm for classification.

As shown in Figure 7a, the support vector machine (SVM) is a classification model that operates under the principle of maximizing intervals to find an optimal hyperplane that differentiates between various data classes (Tharwat, 2020). When the data is linearly separable, SVM determines this optimal hyperplane by maximizing the margin between the classes. If the data is not linearly separable, SVM employs a kernel function to map the data into a higher-dimensional space, allowing for linear separation. SVM is commonly used in the evaluation of perovskite materials, particularly for predicting properties such as bandgaps (Yang C. et al., 2023), formation energies. Although SVM is effective for both high-dimensional and small sample data, it is sensitive to the choice of kernel function and requires careful parameter tuning. Additionally, it has higher computational complexity compared to some other models. As shown in Figure 7b, Decision Trees (DTs) utilize hierarchical splitting based on features to classify data. Jacobs et al. (2024) employed a Random Forest (ensemble DT) model to predict catalytic properties using computationally efficient features. This approach screened >19 million perovskites, successfully identifying cost-effective, stable, and high-performance candidates. DT methods enable rapid property prediction while maintaining interpretability through their tree-based structure. Neural networks (NNs) consist of multiple neurons that learn features and patterns of data by interconnecting and adjusting the weights of input, hidden, and output layers (Figure 7c). Various neural network architectures, such as Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN) (Tena et al., 2021; Benidis et al., 2022), compute outputs through forward propagation and optimize weights via backpropagation to minimize the loss function. Sun (Sun and Yuan, 2023) constructed the Crystal Graph and Chemical Restriction Attention-based Network (CGCR-ABNET) model using transfer learning, which effectively predicted the bandgap of materials. Xie and Grossman (2018a) successfully used the Crystal Graph Convolutional Neural Network (CGCNN) framework to predict the total formation energy of perovskite crystals. Its performance was similar to the error between the calculated DFT value and the experimental value, indicating that the model has high accuracy.

Figure 7
Illustration depicting various machine learning models: (a) Scatter plot with two classes separated by a linear boundary, representing a classification problem. (b) Decision tree diagram showing binary splits for classification. (c) Diagram of a neural network with input, hidden, and output layers. (d) K-means clustering diagram with K equals three, showing grouped data points. (e) Diagram of a boosting algorithm with multiple models combined to improve accuracy. (f) Schematic of a random forest with multiple decision trees showing ensemble learning.

Figure 7. Schematic representation of the principles of the classification algorithm for (a) SVM (Sudha Priyanga et al., 2022). (b) DT (Sun and Yuan, 2023). (c) NNs (Yang C. et al., 2023; Jacobs et al., 2024). (d) KNN (Zhang, 2022). (e) XGBoost (Chen M. et al., 2024). (f) LightGBM (Wang J. et al., 2024).

K-Nearest Neighbors (KNN) is an instance-based learning method that identifies the K nearest sample points, known as nearest neighbors, by calculating the distance between a new sample and those in the training set (Figure 7d). The classification of the new sample is determined based on the categories of these nearest neighbors, often through a voting process. This model is simple to implement, does not require a predetermined training phase, and performs well with small datasets. However, it is computationally intensive when dealing with large-scale and high-dimensional data and is more sensitive to noise and outliers in the data.

XGBoost is an integrated learning method based on gradient-boosting technology (Figure 7e). It effectively addresses the overfitting problem commonly found in traditional gradient-boosting algorithms and enhances computational efficiency by optimizing the objective function and incorporating regularization terms. Chen M. et al. (2024) applied both ML and the Shapley Additive Explanations (SHAP) method to train an XGBoost model to predict the formation energy of perovskite materials.

Adaptive Boosting (AdaBoost) is an ensemble learning technique that operates within the Boosting framework. It trains multiple weak classifiers iteratively and integrates them into a single strong classifier. During each iteration, AdaBoost adjusts the weights of the samples based on those that were misclassified in the previous round. It merges the results of the weak classifiers using a weighted voting mechanism to form the final strong classifier. Hayee et al. (2016) applied AdaBoost to predict the bandgap of perovskite materials. They trained the model by incorporating chemical and crystal structure characteristics, along with experimental data. The results indicated that AdaBoost performed well in predicting the bandgap.

Light Gradient Boosting Machine (LightGBM) is an efficient gradient-boosting decision tree algorithm that enhances model accuracy by constructing multiple decision trees (Figure 7f). It accelerates the search for feature segmentation points using a histogram algorithm and employs a policy optimization tree construction based on leaf nodes. In addition, LightGBM also introduces gradient one-sided sampling (GOSS) and exclusive feature bundling (EFB) to enhance computational speed further. Wang J. et al. (2024) applied the LightGBM algorithm to construct a prediction model for predicting the thermodynamic phase stability of organic-inorganic hybrid perovskite materials. The model prediction results are interpreted using SHAP value analysis to identify the main features that affect the thermodynamic phase stability of perovskites.

4.2 Semi-supervised learning

Self-training (Amini et al., 2023) begins by training an initial model using a small amount of labeled perovskite material data. The model then predicts outcomes for the unlabeled data and retrains itself by incorporating unlabeled data that it predicts with high confidence into the labeled dataset. By continually iterating this process, the model’s performance and generalization are improved. Co-training (Tao et al., 2021) involves the construction of two different classifiers or models that learn from different features or perspectives on the perovskite material data. These two classifiers cooperate during the training process, alternately using each other’s predictions to extend the labeled dataset, thus continuously improving their respective performance. Figure 8 outlines an end-to-end graph-based learning process. Graph Semi-supervised Learning (Gu et al., 2022) (Graph SSL) represents perovskite material data as a graph structure, with nodes representing material samples and edges indicating similarities or correlations between samples. The unlabeled nodes are labeled by propagating the information from the labeled nodes in the graph. This approach enables the effective utilization of a large amount of unlabeled data while fully exploiting the intrinsic connections and similarities among perovskite materials. The unsupervised closed loop is illustrated in Figure 8.

Figure 8
Flowchart showing a machine learning process with six main stages: data preparation, feature engineering, model initialization, label propagation and pseudo-label generation, model training and optimization, and model evaluation and application. Each stage includes specific actions like collecting data, constructing graph structures, pre-training, and hyperparameter tuning. Arrows indicate the progression from one stage to the next.

Figure 8. Flowchart of Graph SSL algorithm.

4.3 Unsupervised learning

As shown in Figure 9, an end-to-end unsupervised machine-learning workflow from raw material data to new material design. Using K-Means cluster analysis, data related to perovskite materials, including their constituent elements, structural parameters, and performance indexes, can be clustered (Laufer et al., 2023). Jiang et al. (2024) employed K-Means clustering to preprocess the data, which enhanced the accuracy of performance predictions for perovskite oxides in the oxygen evolution reaction. Saha et al. (2021) applied a cohesive hierarchical clustering algorithm to screen compounds with potential for photovoltaic applications from a dataset containing 540 halide double perovskites. The Density-based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm is well-suited for analyzing data on perovskite materials with complex distributions. It can automatically detect irregular clusters, making it effective for identifying clustering structures even in the presence of noise and outliers (Ghezelbash et al., 2025). t-distributed Stochastic Neighbor Embedding (t-SNE) is frequently used to visualize high-dimensional data related to perovskite materials. It transforms high-dimensional data into two-dimensional or three-dimensional spaces, allowing similar data points to cluster together while making it easier to distinguish between different data points (Cai and Ma, 2022). The complex data related to perovskite materials were simplified using PCA, which helps in extracting the principal components (Boubchir et al., 2022). PCA can transform a perovskite dataset containing elemental composition, structural information and performance parameters into several principal components, which can explain most of the variance of the data. Raihan et al. (2023) applied PCA to three materials science datasets, reducing multiple features to two principal components to facilitate intuitive analysis and visualization of the patterns and relationships within the datasets.

Figure 9
Flowchart depicting the machine learning process with six main stages: Data Collection and Preprocessing, Feature Extraction and Dimension Reduction, Unsupervised Learning Algorithms, Structure and Performance Analysis, Model Validation and Optimization, and Forecasting and Applications. Detailed subcomponents include data attributes, operations, features, methods like PCA, and algorithms such as K-Means.

Figure 9. The application of unsupervised learning in material exploration.

4.4 Transfer learning

The core advantage of transfer learning lies in leveraging large-scale related datasets to pre-train a source model, which can then be fine-tuned to adapt to small sample perovskite data. This approach enables accurate and efficient predictions of the physical properties of materials, conserving computational resources and enhancing the model’s generalisation ability. In the prediction of the stability of perovskite oxides, Li Y. et al. (2023) constructed a framework by combining a ‘centre-environment’ feature model with deep neural networks. After pre-training and fine-tuning with a small amount of data, the model’s prediction accuracy significantly improved, successfully screening 1,314 types of stable perovskite structures, thereby validating its effectiveness in small-sample material screening.

Furthermore, low-level computational methods such as DFTB and ReaxFF are crucial in addressing the issue of data scarcity, particularly the DFTB method. By comparing with DFT calculations and experimental data, Vicent-Luna et al. (2021) selected 18 ABX3 type metal halide perovskites (MHPs) as the research objects and systematically evaluated the performance of the GFN1-xTB method in calculating the structures, electronic and vibrational properties of MHPs, verified its applicability as an efficient alternative to DFT, and pointed out the limitations that need to be optimized.

5 Screening of excellent performance perovskite candidates

With the development of computational and intelligent technologies, efficient theoretical screening strategies have become crucial. This chapter will elaborate on four core screening methods: structural descriptor methods, DFT calculations, high-throughput computations, and machine learning—discussing their principles, advantages and disadvantages, and application examples.

5.1 Structure descriptor method

5.1.1 Element substitution

The method of investigating the properties of new materials by changing the elements at specific positions in the perovskite structure is known as the elemental substitution method. The advantage is that new material combinations can be generated quickly based on known perovskite structures, and potential materials can be initially screened using theoretical calculations. Jacobsson et al. (2015) have found that Sr can be a suitable candidate, and replacing Pb in perovskites with Sr does not alter the properties of Pb, given the nearly identical ionic radii of Sr2+ and Pb2+. This substitution does not change the crystal structure of the material, and the stability of CH3NH3SrI3 is higher than that of CH3NH3PbI3. In addition, Tian et al. (2025) proposed a divalent cation substitution strategy that can alleviate the ionic migration problem while limiting phase separation. It was shown that the above problem in wide bandgap perovskites could be significantly suppressed by partially replacing cations at the A-site with methylene diammonium cations (MDA2+).

5.1.2 Tolerance factor

Tolerance Factor (TF) serves as a parameter to assess the stability of perovskite structures. The value of the tolerance factor ranges from 0 to 1. In general, the stability of the perovskite structure is better when t is closer to 1. By calculating the tolerance factor for different material combinations, a preliminary assessment of the structural stability of the material can be made. Turnley et al. (2024) proposed a new three-step screening process, which consists of an initial screening based on octahedral factors, a modified tolerance factor screening and a screening for electronegativity differences.

5.1.3 Octahedral factors

The octahedral factor (μ) measures the degree of size matching between B-site ions and X-site ions. Typically, μ value close to 1 indicates that the B-site ions are better embedded in the octahedron composed of X-site ions, which enhances the stability of the perovskite structure. The structural stability after B-site ion replacement can be predicted using the octahedral factor. Recent studies have screened 760 Cs2B2+B’2+X6 double perovskite using high-throughput methods, which include tolerance factor screening, optical property calculation, and correction for heterozygous generalized function (Li M. et al., 2023). Kumar et al. (2008) found that by analyzing 173 ABO3 perovskites, the octahedral factor is of equal importance to the TF and proposed a two-dimensional empirical structure map method based on the octahedral factor and the tolerance factor for predicting the formation ability of perovskite-type oxides.

5.2 Density functional theory calculation

Density functional theory (DFT) calculation methods, based on quantum mechanics, are widely used in the study of electronic structure in materials. DFT calculations can be leveraged to predict the stability and optoelectronic properties of perovskites by calculating parameters such as the energy band structure, density of states, and total energy (Lu et al., 2021) and to screen out candidate materials with excellent properties. Its advantage is that it can provide accurate theoretical predictions and help researchers understand the physicochemical properties of materials in depth. However, the computational cost is relatively high, so it is mainly used for the accurate analysis of a small number of candidate materials. Islam and Hossain (2020) investigated the structural, optical, electronic and mechanical properties of CsSnCl3 perovskites under different hydrostatic pressures by DFT calculations. It was found that the absorption edge of CsSnCl3 shifted to the low-energy region with increasing pressure, and its absorbance, conductivity, and dielectric constant increased. The DFT calculations provide a crucial theoretical foundation for the study of perovskite materials.

5.3 High-throughput calculations

High-throughput computing combines computational simulation and data processing to utilize large-scale computations for screening materials based on specific properties. In the screening of perovskite materials, high-throughput computing utilizes an automated computational process that can rapidly process data from vast material combinations to screen candidates with superior properties quickly. The advantage of this technique is its speed in processing large amounts of data, which improves screening efficiency. However, it faces the difficulties of computational accuracy and complexity of data processing.

5.4 Machine learning approach

Over the past several years, ML techniques have gained increasing popularity in materials science. By constructing datasets and training ML models, one can quickly predict the properties of perovskite material, thereby screening for potential high-performance materials. The advantages of ML methods lie in their ability to deal with non-linear relationships of high complexity and their efficiency in quickly screening a large number of candidates. Standard machine learning algorithms include RF, NNs, SVM, and Gradient-boosting Regression Trees (GBRT) (Peng et al., 2024). Zhai et al. (2022) employed a neural network model in conjunction with elemental characterization to accurately predict the area-specific resistance (ASR) of perovskite oxides and to screen out high-performing materials. Landini et al. (2022) achieved a reasonably accurate prediction of bandgaps by training a machine-learning model on 200 structures. Wu and Wang (2019) screened 230,808 HOIP candidates by combining ML and DFT techniques, identifying 132 promising materials. Thus, machine learning can efficiently screen perovskite candidates with excellent properties, providing strong support for experimental synthesis and practical applications.

6 Examples of predictive studies on perovskite materials

When studying perovskite materials, traditional methods are inefficient and struggle to cope with the vast compositional space. Predictive research that combines high-throughput data and machine learning can accurately anticipate material properties and swiftly screen candidate materials. This section provides an overview of the predictive research on perovskite materials across seven core directions, including bandgap, stability, and photoconversion efficiency, covering mainstream algorithms, model accuracy, and challenges.

6.1 Bandgap performance prediction

The bandgap is an electronic structural property of a material that reflects the energy separation between the highest energy level of the valence band and the lowest energy level of the conduction band of the material. The size of the bandgap determines how well the material can absorb and emit light: the wider the bandgap, the higher the energy of light absorbed by the material; the narrower the bandgap, the lower the energy of light absorbed. Figure 10a demonstrates the prediction process through the pathway of ‘extracting material physicochemical characteristics→constructing a model→outputting bandgap prediction results, intuitively reflecting the practicality of machine learning in bandgap prediction. Researchers often use bandgap prediction to assess the suitability of materials for use in optoelectronic devices, such as solar cells, which require a suitable bandgap to maximize sunlight absorption, and light-emitting diodes, whose bandgap affects the color of the light they emit. Predicting the bandgap enables the rapid identification of potential materials and facilitates the development of new materials. The following text introduces the application of various machine learning models in predicting the perovskite bandgap.

Figure 10
Diagram showing a three-part process. (a) Illustrates a 2D \( \text{A}_2\text{BX}_4 \) perovskite leading to a descriptor matrix, then to a machine learning model for prediction. (b) Displays the structure of layered perovskite with cation variations at sites A and B, and halides at the X site. (c) Graph plotting predicted versus experimental energy values, showing data points along a line.

Figure 10. (a) Predicting the band gap of HOIPs via machine learning methods. Reproduced with permission from Ref. (Wan et al., 2021). (b) The composition of 136 2D HOIPs. Reproduced with permission from Ref. (Wan et al., 2021). (c) Scatter plots comparing predicted and experimental values. Reproduced with permission from Ref. (Wan et al., 2021).

Sabagh Moeini et al. (2024) investigated the application of machine learning in predicting the bandgap of low-symmetry perovskites. They utilized machine learning models to make predictions based on the characteristic features of 40 elements. Wu and Wang (2020) applied the GBR algorithm to predict the electronic bandgap of HOIPs. The model was optimized for hyperparameters by a grid search technique and a cross-validation procedure, and 32 features were created to describe the physical and chemical properties of HOIPs. Two hundred nine orthogonal analogs of HOIPs with suitable bandgaps were screened. The component diversity of 136 types of 2D HOIPs in Figure 10b represents the high-dimensional screening challenge that needs to be addressed by machine learning. This provides a diverse set of feature samples for the model and lays a data foundation for the subsequent screening of 2D HOIPs with suitable bandgaps. Figure 10c is key to validating the model’s accuracy. From the scatter distribution, it can be observed that the predicted values closely fit the experimental values, thereby validating the reliability of ML models. This also indirectly addresses the issue of low efficiency associated with traditional methods mentioned in the introduction of chapter 6, demonstrating that machine learning can replace certain repetitive experiments and expedite material screening. Sudha Priyanga et al. (2022) constructed a database of 5,329 perovskite oxides and selected a variety of elemental attributes as features for predicting bandgap properties of perovskite oxides. The RF algorithm was identified as the best performer in predicting the bandgap type with an accuracy of 91%. This study reveals the effect of key features, such as average ionic properties and electronegativity, on the bandgap type and further explains the model prediction results using SHAP analysis. Figure 11 shows an interpretable material bandgap prediction process with SHAP interpretation. Ghosh and Chowdhury (2024) employed the ML model and first-principles DFT calculations based on 1,563 inorganic nitride perovskite (ABN3) data to investigate the bandgap prediction problem for these materials. Four ML models were used to carry out predictions, and the RFR model achieved the highest prediction accuracy. The bandgap values of two new nitride perovskites, CeMoN3 and CeWN3, were successfully predicted. Li Y. et al. (2021) predicted the bandgap of perovskite materials through machine learning, aiming to enhance the performance of solar cells and LEDs. Research indicates that machine learning algorithms can accurately predict the bandgap, particularly neural network algorithms, which exhibit high prediction accuracy (RMSE is 0.05 eV, and the Pearson coefficient is greater than 0.99). It was further indicated that the bandgap is affected by A-site cations and halide ions acting in concert, and altering these components can modulate the bandgap. Ren et al. (2024) applied an interpretable integrated learning approach to investigate bandgap prediction of halide perovskite materials using 245 trial data. The study indicated that the prediction accuracy of the integrated learning Decision Tree model was high. These studies have not only achieved high-precision prediction of the bandgap of hybridized perovskites through machine learning techniques but also revealed its quantitative relationship with physical covariates, providing new research methods and theoretical support for the field of materials science. Although these studies have made breakthroughs in high-precision prediction and interpretability, they still face challenges posed by the small size of the dataset and the complexity of the model.

Figure 11
Flowchart illustrating a machine learning process for bandgap prediction. The process involves four stages: dataset preparation, machine learning models, bandgap prediction, and explanation. In dataset preparation, features like interatomic distance and crystal structure are extracted. These feed into machine learning models, depicted with a neural network icon, leading to bandgap prediction shown with energy diagrams. The SHAP step provides feature importance, shown with a strategy board. Finally, explanations are visualized with a bar chart of SHAP values. Dotted lines connect steps to show process flow.

Figure 11. Prediction and interpretation of ABX3 type perovskite bandgap.

6.2 Stability predictions

Stability is crucial for perovskite materials, as it directly affects the performance stability and reliability of the materials in practical applications. Stable perovskite materials can maintain their structural and functional properties under different environmental conditions. Usually, we predict the stability of perovskite materials by the Ehull value (Chen L. P. et al., 2022), which is an energy-based stability index used to assess the thermodynamic stability of a material. The smaller the Ehull value, the more stable the material material is, and the less likely it is to decompose or undergo a phase transition. The prediction of Ehull enables the screening of perovskite materials with high stability under specific conditions. Next, we will explore the application of stability prediction for different kinds of perovskites.

Emery and Wolverton (2017) applied an High-Throughput Density Functional Theory (HT-DFT) methodology and analyzed 5,329 data on ABO3 perovskites. The investigation encompassed properties such as formation energy and stability, and 395 compounds were identified as predicted to be thermodynamically stable. Zhao et al. (2024) analyzed 1,133 perovskite oxide data using an interpretable machine learning approach, focusing on investigating the predictions of thermodynamic stability and Ehull for these materials. The results show that the constructed classification model, eXtreme Gradient Boosting Classifier trained on 23 features (XGBC-23), successfully screened 682,143 stable perovskite oxides; the regression model, eXtreme Gradient Boosting Regressor trained on 144 features (XGBR-144), effectively predicted the Ehull values of stable perovskites. Zhao and Wang (2022) used a RF classifier based on 343 data of known ABO3 compounds to investigate the perovskite oxide formation capacity and stability. In investigating the lead-free organic-inorganic hybrid perovskite material A2BB’X6, Cai et al. (2022) analyzed 180,038 compound data using machine learning and HT-DFT calculation techniques. Wu and Wang (2020) first trained a database using a previously generated dataset from a round of machine learning, employing a gradient boosting regression model. After predicting 209 potentially stable HOIPs through charge neutrality and structural stability screening, they further verified them using DFT, ultimately confirming 96 materials with photovoltaic-compatible bandgaps and chemical, thermal, and environmental stability, achieving a prediction of stable HOIPs. Research shows that the selected HOIPs are stable at room temperature, and the perovskite structure remains unchanged in the simulation. In this study, the progressive machine learning method was employed to effectively screen out HOIP materials with suitable bandgaps, stability, and non-toxicity.

6.3 Formation of energy forecasts

Their formation energy can be used to measure the stability of perovskite materials. By calculating and regulating the formation energy, the synthesis conditions and properties of materials can be optimized, guiding the application of perovskite materials in fields such as optoelectronic devices.

Xie and Grossman (2018b) employed the Graph Convolutional Neural Network (CGCNN) in combination with multiple material databases to investigate the methods for predicting the formation energy of materials. Alhashmi et al. (2023) employed a PBE functional calculation method based on DFT to establish a dataset comprising 81 perovskite compounds. Through Weka and Matlab program analysis, they found that compounds with cubic structures had the highest bandgap and formation energy but the lowest dielectric constant. Zhang et al. (2021) employed an interpretable machine learning approach to investigate the prediction of organic-inorganic mixed perovskite structures, utilizing 102 experimental and theoretical calculation data. The results show that the trained XGBoost model excels in predicting HOIP formation. Through high-throughput screening, 198 non-toxic candidate HOIPs were screened from 18,560 virtual samples, and the probability of formation ability of all of them exceeded 0.99. Xu et al. (2018) investigated the formation ability of perovskite materials by using 3,354 perovskite compound data through data cleaning and machine learning methods. The research expanded the perovskite data, corrected the errors in the previous data, and achieved a prediction accuracy rate of 96.3% through machine learning.

Breakthroughs have been made in the research on predicting the formation of perovskite materials, and machine learning has facilitated the design and screening of materials. However, issues such as data quality, model interpretability and experimental verification need to be addressed to promote its application in fields such as optoelectronic devices. Future research should combine theoretical calculations with experiments to enhance the accuracy and reliability of prediction models, thereby laying a solid foundation for the development and application of new materials.

6.4 Photoelectric conversion efficiency (PCE) performance prediction

The key performance indicator of perovskite materials is the photoelectric conversion efficiency, which demonstrates the ability to convert light energy into electrical energy. The efficiency of perovskite solar cells has been continuously improving in recent years thanks to the material’s excellent light absorption, carrier diffusion length and adjustable bandgap. Hussain et al. (2023) investigated the PCE of perovskite solar cells using a two-step prediction methodology based on 42,000 experimental data points. Workman et al. (2020) explored the problem of predicting the photoelectronic properties of chalcogenide solar cells using ML methods like ANN, SVR, and RF with a dataset that contains various features of perovskite materials such as chemical composition, structural properties, and photovoltaic performance.

6.5 Crystal structure prediction

The properties of perovskite materials are significantly affected by the crystal structure. The diversity of crystal structures can be utilized to optimize optoelectronic properties, such as bandgap, carrier mobility, and light absorption capacity, by adjusting the elemental combinations of A, B, and X sites to meet different application requirements. Meanwhile, the stability of the crystal structure also directly affects the durability of the material under thermal, chemical and mechanical environments, thus influencing its reliability and service life in practical applications. Additionally, the designability of crystal structures enables their widespread application in the energy and photovoltaic fields. Zhang and Xu (2021) investigated the problem of predicting the lattice constants of cubic perovskite oxides and halides based on 149 cubic perovskite data using the Gaussian Process Regression (GPR) method. Chang et al. (2024) applied the ShotgunCSP method to investigate the problem of predicting stable or substable crystal structures based on 3,354 virtually generated crystal structure data. Zhang L. et al. (2023) utilized RF and Bond-valence Vector Sum (BVVS) features to identify the crystal structure of 1,647 perovskite data sets automatically. These studies offer new approaches for rapidly, efficiently, and economically determining the structure of perovskites, thereby avoiding expensive DFT calculations, reducing computational load, and accelerating the design and discovery of new materials.

6.6 Prediction of ferroelectric properties

The ferroelectric properties of perovskite materials originate from the displacement of ions or the arrangement of electric dipoles within their crystal structure, exhibiting spontaneous polarization and polarization reversal characteristics. This has application potential in fields such as non-volatile memory and sensors, which is crucial for understanding the relationship between structure and characteristics and optimizing material design. Lu et al. (2024) constructed a ferroelectricity classification model for ABO3-type perovskites, which can predict properties such as bandgap and dielectric loss of perovskite materials. This model designs perovskite materials with specific properties for different application scenarios. To identify new inorganic oxide perovskites that may possess ferroelectric properties, Min and Cho (2020) proposed an active learning model to estimate the bandgap and formation energy. The model was shown to be reliable using only an initial dataset of no more than 10% of the entire material space. By actively learning 30 percent of the data from the entire dataset, more than 90 percent of satisfactory materials were successfully screened.

6.7 Defect performance prediction

The properties of perovskite materials are significantly affected by the nature of the defects. Point defects, line defects, and surface defects are usual defects. These can have an impact on the material’s electrical, optical, and mechanical properties. Modulating defects by optimizing synthesis conditions or employing doping techniques can enhance the properties of the materials and strengthen their potential for applications in optoelectronics. Wu et al. (2023) developed a method using machine learning to predict and evaluate defect transfer energy levels and constructed multiple machine learning models to investigate the connection between perovskite defect energy levels and these models. It was found that the combination of artificial intelligence techniques can accurately predict the stability of perovskite defects and their physicochemical properties, providing theoretical guidance for enhancing highly efficient and perovskite-based optoelectronic devices with high performance.

7 Challenges and future directions

This section first analyses the existing issues in predictive research on perovskite materials, and then looks forward to cutting-edge technological directions such as multimodal learning, robotic-AI closed-loop systems, and the collaboration of quantum computing and AI, to provide insights for overcoming challenges and accelerating the development of new materials.

7.1 Current limitations

The core challenge in perovskite research lies in the low quality of data and the lack of standards (Valencia et al., 2025): significant discrepancies between experimental processes and equipment, biases in DFT calculations (Li Z. et al., 2019) and noise/lack of data leading to wrong laws for model learning (Alzraiee and Niswonger, 2024; Ahmad et al., 2024; Gupta and Gupta, 2019; Talapatra et al., 2021). A major limitation of current AI-based perovskite research is its excessive reliance on DFT-derived data, which further amplifies the aforementioned biases in DFT calculations and leads to biased machine learning models based on inaccurately predicted properties.

DFT underestimates the bandgap of halide perovskites by 0.8–1.2 eV. As a result, ML models trained solely on DFT data systematically underestimate the experimental bandgap, which also affects the accuracy of learning laws related to the bandgap in perovskite research. DFT fails to capture processes on long timescales, leading to a lack of training data for machine learning models aimed at long-term stability—further exacerbating the ‘small sample’ (Sum et al., 2016; Song et al., 2015) problem in perovskite research when investigating long-term stability. Experimental costs are high and time-consuming, and small samples are insufficient to support machine learning; transfer learning can alleviate the issue of data scarcity (Parker Tian et al., 2024), but cross-domain adaptation remains challenging (Jiang et al., 2022b). Although deep learning is highly accurate in predictions, it operates as a ‘black box’ and cannot reveal the underlying physical mechanisms (Yao et al., 2025; Obada et al., 2023), making it difficult to guide experimental and theoretical developments.

7.2 Frontier technology outlook

Multimodal learning fusion experiments and computational data (Shao et al., 2025) reveal structure-property relationships, enhancing prediction reliability. As illustrated in Figure 12, active learning (Du et al., 2017) and automated experiments form a robot-AI closed-loop system (Li et al., 2020; Moon et al., 2024; Sun et al., 2019), accelerating iteration, reducing costs, and improving efficiency through high-throughput synthetic characterisation and AI-driven screening of optimal formulations (Omidvar et al., 2024). To address the computational biases and data limitations of DFT mentioned in Section 7.1, multi-scale computational data fusion emerges as a core direction, whose implementation relies on multi-method data fusion and automated multi-scale data generation.

Figure 12
Flowchart depicting a process. It starts with AI generation of candidate materials, followed by robotic synthesis. Next is automated characterization. If performance meets criteria, it outputs optimized solutions; if not, active learning updates models.

Figure 12. Robotics-AI closed loop intelligent R & D cycle technology architecture.

Multiscale computational data integration represents a core strategy for enhancing the accuracy of machine learning models. Two promising directions emerge: firstly, calibrating lower-level data (e.g., DFT, DFTB) using higher-level methods (e.g., GW, TD-DFT); secondly, combining data from diverse methodologies to address multi-property tasks. In the future, high-throughput platforms will integrate DFT, DFTB, and ReaxFF into a unified workflow to address the scarcity of long-time-scale DFT data. For instance, the ‘DFT→DFTB→ReaxFF’ pipeline can rapidly generate multidimensional training data, alleviating the small sample size issue in perovskite research. The synergistic application of quantum computing and artificial intelligence can expedite the discovery of novel material systems (Chu et al., 2020).

8 Conclusion

AI has evolved from an early auxiliary tool to become the core driving force of perovskite research (Wu et al., 2020): shifting from passive fitting to active design, leading the ‘design-synthesis-verification’ intelligent closed-loop (Xue et al., 2025), and achieving global optimisation of efficiency and stability. The complexity and multidisciplinary nature of perovskite materials research determines the indispensability of interdisciplinary collaboration (Zhu and Hu, 2023). This change requires deep collaboration among materials, algorithms, and experiments: materials scientists ask questions and prepare samples, algorithm engineers build optimisation models, and experimental teams validate and feedback data to form credible iterations. Interdisciplinary cooperation (Bi et al., 2024) will be the key to breakthrough.

Author contributions

BW: Writing – original draft. JW: Methodology, Writing – review and editing. LL: Writing – review and editing. DW: Writing – review and editing.

Funding

The authors declare that financial support was received for the research and/or publication of this article. This research was funded by the Innovation Capability Support Plan Project of Shaanxi Province (No. 2024ZC-KJXX-020), the Shaanxi Association for Science and Technology Youth Talent Support Program (No. 20230520).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Ahmad, A.-F., Sayeed, M. S., Alshammari, K., and Ahmed, I. (2024). Impact of missing values in machine learning: a comprehensive analysis. CMC Comput. Mater. & Continua. doi:10.48550/arXiv.2410.08295

CrossRef Full Text | Google Scholar

Alam, T., and Prasad, A. (2025). Artificial intelligence in perovskite-based materials for energy applications. LatIA 3 (125), 125. doi:10.62486/latia2025125

CrossRef Full Text | Google Scholar

Alhashmi, A., Kanoun, M. B., and Goumri-Said, S. (2023). Machine learning for halide perovskite materials ABX3 (B = Pb, X = I, Br, Cl): assessment of structural properties and band gap engineering for solar energy. Materials 16 (7), 2657. doi:10.3390/ma16072657

PubMed Abstract | CrossRef Full Text | Google Scholar

Alzraiee, A. H., and Niswonger, R. G. (2024). A probabilistic approach to training machine learning models using noisy data. Environ. Modell. Softw. 179, 106133. doi:10.1016/j.envsoft.2024.106133

CrossRef Full Text | Google Scholar

Amini, M.-R., Feofanov, V., Pauletto, L., Hadjadj, L., Devijver, E., and Maximov, Y. (2023). Self-training: a survey.

Google Scholar

Benidis, K., Rangapuram, S. S., Flunkert, V., Wang, Y., Maddix, D., Turkmen, C., et al. (2022). Deep learning for time series forecasting: tutorial and literature survey. ACM Comput. Surv. 55 (6), 1–36. doi:10.1145/3533382

CrossRef Full Text | Google Scholar

Bhattacharya, S., and Roy, A. (2023). Linking stability with molecular geometries of perovskites and lanthanide richness using machine learning methods. Comput. Mater. Sci. 231, 112581. doi:10.1016/j.commatsci.2023.112581

CrossRef Full Text | Google Scholar

Bi, Z. X., Bai, Y. F., Shi, Y., Sun, T., Wu, H., Zhang, H. C., et al. (2024). Intrinsic exciton transport and recombination in single-crystal lead bromide perovskite. J. Am. Chem. Soc. 19 (21), 19989–20000. doi:10.1021/acsnano.5c03274

PubMed Abstract | CrossRef Full Text | Google Scholar

Boubchir, M., Boubchir, R., and Aourag, H. (2022). The principal component analysis as a tool for predicting the mechanical properties of perovskites and inverse perovskites. Chem. Phys. Lett. 798, 139615. doi:10.1016/j.cplett.2022.139615

CrossRef Full Text | Google Scholar

Breiman, L. (2001). Random forests. Mach. Learn. 45 (1), 5–32. doi:10.1023/a:1010933404324

CrossRef Full Text | Google Scholar

Cai, T. T., and Ma, R. (2022). Theoretical foundations of t-SNE for visualizing high-dimensional clustered data. J. Mach. Learn. Res. 23, 1–54. doi:10.48550/arXiv.2105.07536

CrossRef Full Text | Google Scholar

Cai, X., Zhang, Y., Shi, Z., Chen, Y., Xia, Y., Yu, A., et al. (2022). Discovery of lead-free perovskites for high-performance solar cells via machine learning: ultrabroadband absorption, low radiative combination, and enhanced thermal conductivities. Adv. Sci. 9 (1), 2103648. doi:10.1002/advs.202103648

PubMed Abstract | CrossRef Full Text | Google Scholar

Chang, L., Tamaki, H., Yokoyama, T., Wakasugi, K., Yotsuhashi, S., Kusaba, M., et al. (2024). Shotgun crystal structure prediction using machine-learned formation energies. Npj Comput. Mater. 10 (1), 298. doi:10.1038/s41524-024-01471-8

CrossRef Full Text | Google Scholar

Chen, J. L., Feng, M. J., Zha, C. Y., Shao, C. R., Zhang, L. H., and Wang, L. (2022). Machine learning-driven design of promising perovskites for photovoltaic applications: a review. Surf. Int. 35, 102470. doi:10.1016/j.surfin.2022.102470

CrossRef Full Text | Google Scholar

Chen, L. P., Wang, X. C., Xia, W. J., and Liu, C. H. (2022). PSO-SVR predicting for the Ehull of ABO3-type compounds to screen the thermodynamic stable perovskite candidates based on multi-scale descriptors. Comput. Mat. Sci. 211, 111435. doi:10.1016/j.commatsci.2022.111435

CrossRef Full Text | Google Scholar

Chen, C., Maqsood, A., and Jacobsson, T. J. (2023). The role of machine learning in perovskite solar cell research. J. Alloy Compd. 960, 170824. doi:10.1016/j.jallcom.2023.170824

CrossRef Full Text | Google Scholar

Chen, Z., Hoye, R. L. Z., Yip, H.-L., Fiuza-Maneiro, N., López-Fernández, I., Otero-Martínez, C., et al. (2024a). Roadmap on perovskite light-emitting diodes. J. Phys. Photonics 6 (3), 032501. doi:10.1088/2515-7647/ad46a6

CrossRef Full Text | Google Scholar

Chen, Z., Wang, J., Li, C., Liu, B., Luo, D., Min, Y., et al. (2024b). Highly versatile and accurate machine learning methods for predicting perovskite properties. J. Phys. Chem. C. 12 (38), 15444–15453. doi:10.1039/d4tc02268h

CrossRef Full Text | Google Scholar

Chen, M., Yin, Z., Shan, Z., Zheng, X., Liu, L., Dai, Z., et al. (2024). Application of machine learning in perovskite materials and devices: a review. J. Energy Chem. 94, 254–272. doi:10.1016/j.jechem.2024.02.035

CrossRef Full Text | Google Scholar

Cheng, J. R., He, P. F., Li, Y. X., and Lei, Y. M. (2025). Data and model driven intelligent computing framework for perovskite materials. Mater. China. 44 (04), 309–317. doi:10.7502/j.issn.1674-3962.202412002

CrossRef Full Text | Google Scholar

Chu, W. B., Zheng, Q. J., Prezhdo, O. V., Zhao, J., and Saidi, W. A. (2020). Low-frequency lattice phonons in halide perovskites explain high defect tolerance toward electron-hole recombination. Sci. Adv. 6, eaaw7453. doi:10.1126/sciadv.aaw7453

PubMed Abstract | CrossRef Full Text | Google Scholar

Dean, J., Scheffler, M., Purcell, T. A. R., Barabash, S. V., Bhowmik, R., and Bazhirov, T. (2021). Interpretable machine learning for materials design. J. Mater. Res. 38 (6). doi:10.1557/s43578-023-01164-w

CrossRef Full Text | Google Scholar

Di, J., Lin, Z., Su, J., Wang, J., Zhang, J., Liu, S., et al. (2021). Two-dimensional (C6H5C2H4NH3)2PbI4 perovskite single crystal resistive switching memory devices. IEEE EDL 42 (3), 327–330. doi:10.1109/led.2021.3053009

CrossRef Full Text | Google Scholar

Dong, X., Cheng, P., Guo, P. Y., Liu, G. H., Li, Y. Q., Wu, Z. B., et al. (2021). Ion migration in perovskite field-effect transistors (invited). ACTA PHOTONICA SIN. 50 (10), 1016002. doi:10.3788/gzxb20215010.1016002

CrossRef Full Text | Google Scholar

Du, B., Wang, Z. M., Zhang, L. F., Zhang, L. P., Liu, W., Shen, J. L., et al. (2017). Exploring representativeness and informativeness for active learning. IEEE T. Cybern. 47 (1), 14–26. doi:10.1109/tcyb.2015.2496974

PubMed Abstract | CrossRef Full Text | Google Scholar

Emery, A. A., and Wolverton, C. (2017). High-throughput DFT calculations of formation energy, stability and oxygen vacancy formation energy of ABO3 perovskites. Sci. Data. 4, 170153. doi:10.1038/sdata.2017.153

PubMed Abstract | CrossRef Full Text | Google Scholar

Fitrianto, A., and Xin, S. H. (2022). Comparisons between robust regression approaches in the presence of outliers and high leverage points. Barekeng J. Math. App. 16 (1), 243–252. doi:10.30598/barekengvol16iss1pp241-250

CrossRef Full Text | Google Scholar

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Stat. 29 (5), 1189–1232. doi:10.1214/aos/1013203451

CrossRef Full Text | Google Scholar

Ghezelbash, R., Daviran, M., Maghsoudi, A., and Hajihosseinlou, M. (2025). Density based spatial clustering of applications with noise and fuzzy C-means algorithms for unsupervised mineral prospectivity mapping. Earth Sci. Inf. 18 (1), 217. doi:10.1007/s12145-025-01708-0

CrossRef Full Text | Google Scholar

Ghosh, S., and Chowdhury, J. (2024). Predicting band gaps of ABN3 perovskites: an account from machine learning and first-principle DFT studies. RSC Adv. 14, 6385–6397. doi:10.1039/d4ra00402g

PubMed Abstract | CrossRef Full Text | Google Scholar

Gu, G. H., Jang, J., Noh, J., Walsh, A., and Jung, Y. (2022). Perovskite synthesizability using graph neural networks. Npj Comput. Mater. 8 (1), 71. doi:10.1038/s41524-022-00757-z

CrossRef Full Text | Google Scholar

Guan, Z., Zhang, H., and Yang, G. (2025). Advances in perovskite lasers. J. Semicond. 46 (4), 041401. doi:10.1088/1674-4926/24100029

CrossRef Full Text | Google Scholar

Guo, H. J., An, S. L., Meng, J., Ren, S. X., Wang, W. W., Liang, Z. S., et al. (2023). Research progress of photoelectric resistive switching mechanism of halide perovskite. J. Inorg. Mat. 38 (9), 1005–1016. (Chinese Journal). doi:10.15541/jim20230132

CrossRef Full Text | Google Scholar

Guo, Z., and Lin, B. (2021). Machine learning stability and band gap of lead-free halide double perovskite materials for perovskite solar cells. Sol. Energy 228, 689–699. doi:10.1016/j.solener.2021.09.030

CrossRef Full Text | Google Scholar

Gupta, S., and Gupta, A. (2019). Dealing with noise problem in machine learning data-sets: a systematic review. Proc. Comput. Sci. 161, 466–474. doi:10.1016/j.procs.2019.11.146

CrossRef Full Text | Google Scholar

Hashimoto, Y., Tomai, T., Jia, X., and Li, H. (2025). Materials map integrating experimental and computational data through graph-based machine learning for enhanced materials discovery. arXiv. doi:10.48550/ARXIV.2503.07378

CrossRef Full Text | Google Scholar

Hastie, T. (2020). Ridge regularization: an essential concept in data science. Technometrics 62 (4), 426–433. doi:10.1080/00401706.2020.1791959

PubMed Abstract | CrossRef Full Text | Google Scholar

Hayee, F., Datye, I. M., and Kini, R. (2016). Final report: data-driven prediction of band gap of materials.

Google Scholar

Henderson, I., Noble, P., and Roustant, O. (2023). Covariance models and Gaussian process regression for the wave equation. Application to related inverse problems. J. Comput. Phys. 494, 112519. doi:10.1016/j.jcp.2023.112519

CrossRef Full Text | Google Scholar

Hogg, D. W., and Villar, S. (2021). Fitting very flexible models: linear regression with large numbers of parameters. PASP 133 (093001), 1–18. doi:10.1088/1538-3873/ac20ac

CrossRef Full Text | Google Scholar

Hussain, W., Sawar, S., and Sultan, M. (2023). Leveraging machine learning to consolidate the diversity in experimental results of perovskite solar cells. RSC Adv. 13, 22529–22537. doi:10.1039/d3ra02305b

PubMed Abstract | CrossRef Full Text | Google Scholar

Irshad, M., Ain, Q. T., Zaman, M., Aslam, M. Z., Kousar, N., Asim, M., et al. (2022). Photocatalysis and perovskite oxide-based materials: a remedy for a clean and sustainable future. RSC Adv. 12, 7009–7039. doi:10.1039/d1ra08185c

PubMed Abstract | CrossRef Full Text | Google Scholar

Islam, J., and Hossain, A. K. M. A. (2020). Semiconducting to metallic transition with outstanding optoelectronic properties of CsSnCl3 perovskite under pressure. Sci. Rep. 10 (1), 14391. doi:10.1038/s41598-020-71223-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Jacobs, R., Liu, J., Abernathy, H., and Morgan, D. (2024). Machine learning design of perovskite catalytic properties. Adv. Energy Mat. 14, 2303684. doi:10.1002/aenm.202303684

CrossRef Full Text | Google Scholar

Jacobsson, T. J., Pazoki, M., Hagfeldt, A., and Edvinsson, T. (2015). Goldschmidt’s rules and strontium replacement in lead halogen perovskite solar cells: theory and preliminary experiments on CH3NH3SrI3. J. Phys. Chem. C. 119 (45), 25673–25683. doi:10.1021/acs.jpcc.5b06436

CrossRef Full Text | Google Scholar

Jiang, J., Xiong, M., Fan, K., Bao, C., Xin, D., Pan, Z., et al. (2022a). Synergistic strain engineering of perovskite single crystals for highly stable and sensitive X-ray detectors with low-bias imaging and monitoring. Nat. Photonics. 16, 575–581. doi:10.1038/s41566-022-01024-9

CrossRef Full Text | Google Scholar

Jiang, J., Shu, Y., Wang, J., and Long, M. (2022b). Transferability in deep learning: a survey. arXiv. doi:10.48550/arXiv.2201.05867

CrossRef Full Text | Google Scholar

Jiang, C., He, H., Guo, H., Zhang, X., Han, Q., Weng, Y., et al. (2024). Transfer learning guided discovery of efficient perovskite oxide for alkaline water oxidation. Nat. Commun. 15 (1), 6301. doi:10.1038/s41467-024-50605-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Jin, L. Z., Du, Z. J., Shu, L., Cen, Y., Xu, Y. F., Mei, Y. F., et al. (2025). Transformer-generated atomic embeddings to enhance prediction accuracy of crystal properties with machine learning. Nat. Commun. 16 (1), 1210. doi:10.1038/s41467-025-56481-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Kumar, A., Verma, A. S., and Bhardwaj, S. R. (2008). Prediction of formability in perovskite-type Oxides Open Appl. Phys. 1, 11–19. doi:10.2174/1874183500801010011

CrossRef Full Text | Google Scholar

Katsikioti, D. (2024). Knock mitigation study on alternative fuel heavy duty engines. Master’s thesis in mobility engineering. Gothenburg, Sweden: Department of Mechanics and Maritime Sciences, Chalmers University of Technology.

Google Scholar

Khozeimeh, F., Sharifrazi, D., Izadi, N. H., Joloudari, J. H., Shoeibi, A., Alizadehsani, R., et al. (2022). RF-CNN-F: Random forest with convolutional neural network features for coronary artery disease diagnosis based on cardiac magnetic resonance. Sci. Rep. 12 (1), 11178. doi:10.1038/s41598-022-15374-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Kumar, P. S., and Sivamani, S. (2021). Numerical analysis and implementation of artificial neural network algorithm for nonlinear function. Int. J. Inf. Technol. 13 (3), 2059–2068. doi:10.1007/s41870-021-00743-6

CrossRef Full Text | Google Scholar

Kusuma, F. J., Widianto, E., Wahyono, , Santoso, I., Sholihun, , Absor, M. A. U., et al. (2025). Multi-properties prediction of perovskite materials using machine learning and meta-heuristic feature selection. Sol. Energy 286, 113189. doi:10.1016/j.solener.2024.113189

CrossRef Full Text | Google Scholar

Landini, E., Reuter, K., and Oberhofer, H. (2022). Machine-learning based screening of lead-free halide double perovskites for photovoltaic applications. arXiv. doi:10.48550/arXiv.2208.12736

CrossRef Full Text | Google Scholar

Lang, Y. F., Zou, D. F., Xu, Y., Jiang, S. L., Zhao, Y. Q., and Ang, Y. S. (2024). Electronic characteristics of the two-dimensional van der Waals ferroelectric α-In2Se3/Cs3Bi2I9 heterostructures. Appl. Phys. Lett. 124 (5), 052903. doi:10.1063/5.0189709

CrossRef Full Text | Google Scholar

Laufer, F., Ziegler, S., Schackmar, F., Moreno Viteri, E. A., Götz, M., Debus, C., et al. (2023). Process insights into perovskite thin-film photovoltaics from machine learning with in situ luminescence data. Sol. RRL 7 (5), 2201114. doi:10.1002/solr.202201114

CrossRef Full Text | Google Scholar

Leccisi, E., and Fthenakis, V. (2024). Life-cycle human- and eco-toxicity assessment of emerging lead-based perovskite compared to conventional photovoltaic panels. MRS Bull. 49 (12), 1240–1250. doi:10.1557/s43577-024-00812-8

CrossRef Full Text | Google Scholar

Li, Y., and Wang, X. (2024). Time-dependent DFT-driven machine learning for optical absorption spectra of perovskites. ACS Appl. Mat. 16, 8902–8910.

Google Scholar

Li, W., Jacobs, R., and Morgan, D. (2018). Predicting the thermodynamic stability of perovskite oxides using machine learning models. Comput. Mater. Sci. 150, 454–463. doi:10.1016/j.commatsci.2018.04.033

CrossRef Full Text | Google Scholar

Li, X., Dan, Y., Dong, R., Cao, Z., Niu, C., Song, Y., et al. (2019). Computational screening of new perovskite materials using transfer learning and deep learning. Appl. Sci. 9 (24), 5510. doi:10.3390/app9245510

CrossRef Full Text | Google Scholar

Li, Z., Xu, Q., Sun, Q., Hou, Z., and Yin, W.-J. (2019). Thermodynamic stability landscape of halide double perovskites via high-throughput computing and machine learning. Adv. Funct. Mat. 29 (9), 1807280. doi:10.1002/adfm.201807280

CrossRef Full Text | Google Scholar

Li, J., Li, J., Liu, R., Tu, Y., Li, Y., Cheng, J., et al. (2020). Autonomous discovery of optically active chiral inorganic perovskite nanocrystals through an intelligent cloud lab. Nat. Commun. 11 (1), 2046. doi:10.1038/s41467-020-15728-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Li R., R., Deng, Q., Tian, D., Zhu, D., and Lin, B. (2021). Predicting perovskite performance with multiple machine-learning algorithms. Crystals 11 (7), 818. doi:10.3390/cryst11070818

CrossRef Full Text | Google Scholar

Li, Y., Lu, Y., Huo, X., Wei, D., Meng, J., Dong, J., et al. (2021). Bandgap tuning strategy by cations and halide ions of lead halide perovskites learned from machine learning. RSC Adv. 11, 15688–15694. doi:10.1039/d1ra03117a

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, S., Wang, Y., Yang, M., Miao, J., Lin, K., Li, Q., et al. (2022). Ferroelectric thin films: performance modulation and application. Mat. Adv. 3 (1), 5735–5752. doi:10.1039/d2ma00381c

CrossRef Full Text | Google Scholar

Li, Y., Zhu, R., Wang, Y., Feng, L., and Liu, Y. (2023). Center-environment deep transfer machine learning across crystal structures: from spinel oxides to perovskite oxides. Npj Comput. Mater. 9 (1), 109. doi:10.1038/s41524-023-01068-7

CrossRef Full Text | Google Scholar

Li, M., Wang, X., Xie, J., Wang, X., Zou, H., Yang, X., et al. (2023). Theoretical design of optoelectronic semiconductors. Sci. Bull. 68 (17), 2221–2238. doi:10.1360/tb-2022-1217

CrossRef Full Text | Google Scholar

Liang, Y., Gao, X., Li, C., Yang, C., Cai, X. H., Gong, Y., et al. (2025). Enhanced interfacial exciton transport in mixed 2D/3D perovskites approaching bulk 3D counterparts. ACS Nano 19 (19), 18833–18842. doi:10.1021/acsnano.5c04246

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Y., Ma, Z., Zhang, J., He, Y., Dai, J., Li, X., et al. (2025). Light-emitting diodes based on metal halide perovskite and perovskite related nanocrystals. Adv. Mat. 37, 2415606. doi:10.1002/adma.202415606

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, T. H., Yuan, Z. Q., Wang, L. X., Shan, C., Zhang, Q. L., Chen, H., et al. (2025). Chelated tin halide perovskite for near-infrared neuromorphic imaging array enabling object recognition and motion perception. Nat. Commun. 16 (1), 4261. doi:10.1038/s41467-025-59624-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Lu, H.-D., Han, H.-J., and Liu, J. (2021). Simulation and property calculation for FA1-xCsxPbI3-yBry: structures and optoelectronical properties. Acta Phys. Sin. 70 (3), 036301. doi:10.7498/aps.70.20201387

CrossRef Full Text | Google Scholar

Lu, W., Wu, Y., Liu, T., Lu, T., Ji, X., and Xing, L. (2024). Machine learning-based materials design. J. Henan Norm. Univ. 52 (4), 120–131. (Chinese Journal). doi:10.16366/j.cnki.1000-2367.2023.11.16.0003

CrossRef Full Text | Google Scholar

Ma, Y., Xu, X., Li, T., Wang, Z., Li, N., Zhao, X., et al. (2025). Amplified narrowband perovskite photodetectors enabled by independent multiplication layers for anti-interference light detection. Sci. Adv. 11, eadq1127. doi:10.1126/sciadv.adq1127

PubMed Abstract | CrossRef Full Text | Google Scholar

Ma, X. T., Yao, X. J., Zhao, Y. Z., Zhu, G. Y., and Zhong, H. L. (2025). Life cycle toxicity and reduction potential analysis of perovskite photovoltaic technology. J. Environ. Manage. 393, 127170. doi:10.1016/j.jenvman.2025.127170

PubMed Abstract | CrossRef Full Text | Google Scholar

Maietta, I., Otero-Martínez, C., Fernández, S., Sánchez, L., González-Fernández, Á., Polavarapu, L., et al. (2025). The toxicity of lead and lead-free perovskite precursors and nanocrystals to human cells and aquatic organisms. Adv. Sci. 12 (13), e2415574. doi:10.1002/advs.202415574

PubMed Abstract | CrossRef Full Text | Google Scholar

Mangi, M. A., Elahi, H., Ali, A., Jabbar, H., Aqeel, A. B., Farrukh, A., et al. (2025). Applications of piezoelectric-based sensors, actuators, and energy harvesters. Sens. Actuators Rep. 9, 100302. doi:10.1016/j.snr.2025.100302

CrossRef Full Text | Google Scholar

Mao, L., and Xiang, C. Y. (2025). A comprehensive review of machine learning applications in perovskite solar cells: materials discovery, device performance, process optimization and systems integration. Mater. Today Energy 47, 101742. doi:10.1016/j.mtener.2024.101742

CrossRef Full Text | Google Scholar

Min, K., and Cho, E. (2020). Accelerated discovery of potential ferroelectric perovskite via active learning. J. Mat. Chem. C. 8 (23), 7866–7872. doi:10.1039/d0tc00985g

CrossRef Full Text | Google Scholar

Moon, J., Beker, W., Siek, M., Kim, J., Lee, H. S., Hyeon, T., et al. (2024). Active learning guides discovery of a champion four-metal perovskite oxide for oxygen evolution electrocatalysis. Nat. Mat. 23, 108–115. doi:10.1038/s41563-023-01707-w

PubMed Abstract | CrossRef Full Text | Google Scholar

Obada, D. O., Okafor, E., Abolade, S. A., Ukpong, A. M., Dodoo-Arhin, D., and Akande, A. (2023). Explainable machine learning for predicting the band gaps of ABX3 perovskites. Sci. Semicond. Process. 161, 107427. doi:10.1016/j.mssp.2023.107427

CrossRef Full Text | Google Scholar

Omidvar, M., Zhang, H., Ihalage, A. A., Saunders, T. G., Giddens, H., Forrester, M., et al. (2024). Accelerated discovery of perovskite solid solutions through automated materials synthesis and characterization. Nat. Commun. 15, 6554. doi:10.1038/s41467-024-50884-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Parker Tian, S. I., Ren, Z., Venkataraj, S., Cheng, Y., Bash, D., Oviedo, F., et al. (2024). Correction: tackling data scarcity with transfer learning: a case study of thickness characterization from optical spectra of perovskite thin films. Digit. Discov. 3, 1068. doi:10.1039/d4dd90015d

CrossRef Full Text | Google Scholar

Pathak, R., Anoop, G., and Samanta, S. (2025). Advancements in free-standing ferroelectric films: paving the way for transparent flexible electronics. J. Compos. Sci. 9 (2), 71. doi:10.3390/jcs9020071

CrossRef Full Text | Google Scholar

Peng, L. L., Qian, J. J., Jia, X. H., Wang, Y. Y., and Fan, G. F. (2024). Machine learning-based prediction and influencing factors on the performance of perovskite solar cells. Sus. Ener. 14 (4), 53–64. doi:10.12677/se.2024.144005

CrossRef Full Text | Google Scholar

Pilania, G., Mannodi-Kanakkithodi, A., Uberuaga, B. P., Ramprasad, R., Gubernatis, J. E., and Lookman, T. (2016). Machine learning bandgaps of double perovskites. Sci. Rep. 6 (1), 19375. doi:10.1038/srep19375

PubMed Abstract | CrossRef Full Text | Google Scholar

Ren, C., Wu, Y., Zou, J., and Cai, B. (2024). Employing the interpretable ensemble learning approach to predict the bandgaps of the halide perovskites. Materials. 17 (11), 2686. doi:10.3390/ma17112686

PubMed Abstract | CrossRef Full Text | Google Scholar

Rahman, M. A., Shahjahan, M., Zhang, Y., Wu, R., and Harel, E. (2025). Chemical space-property predictor model of perovskite materials by high-throughput synthesis and artificial neural networks. Chem. 11 (4), 102360. doi:10.1016/j.chempr.2024.10.027

CrossRef Full Text | Google Scholar

Raihan, A. S., Khosravi, H., Das, S., and Ahmed, I. (2023). Accelerating material discovery with a threshold-driven hybrid acquisition policy-based Bayesian optimization. arXiv. doi:10.48550/arXiv.2311.09591

CrossRef Full Text | Google Scholar

Sabagh Moeini, A., Shariatmadar Tehrani, F., and Naeimi-Sadigh, A. (2024). Machine learning-enhanced band gaps prediction for low-symmetry double and layered perovskites. Sci. Rep. 14 (1), 26736. doi:10.1038/s41598-024-77081-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Saha, U., Debnath, K., and Satapathi, S. (2021). Screening of potential double perovskite materials for photovoltaic applications using agglomerative hierarchical clustering. arXiv. doi:10.48550/arXiv.2111.07557

CrossRef Full Text | Google Scholar

Shao, S., Yan, L., Li, J., Zhang, Y., Zhang, J., Kim, H. W., et al. (2025). Multimodal deep learning-driven exploration of lanthanide-based perovskite oxide semiconductors for ultra-sensitive detection of 2-butanone. Chem. Eng. J. 515, 162154. doi:10.1016/j.cej.2025.162154

CrossRef Full Text | Google Scholar

Sharma, S., Ward, Z. D., Bhimani, K., Sharma, M., Quinton, J., Rhone, T. D., et al. (2023). Machine learning-aided band gap engineering of BaZrS3 chalcogenide perovskite. ACS Appl. Mat. Interfaces 15 (15), 18962–18972. doi:10.1021/acsami.3c00618

PubMed Abstract | CrossRef Full Text | Google Scholar

Shi, Y., Deng, X., Gan, Y., Xu, L., Zhang, Q., and Xiong, Q. (2025). Ten years of perovskite lasers. Adv. Mat. 37, 2413559. doi:10.1002/adma.202413559

PubMed Abstract | CrossRef Full Text | Google Scholar

Siad, A. B., Riane, H., Siad, M. B., Dahou, F. Z., Allouche, A., and Baira, M. (2024). Elevating energy device potential: exploring optoelectronic and thermoelectric advantages in stable double perovskites K2NaInX6 (X = F, Cl, Br, I) via ab initio analysis. J. Mat. Sci. 59 (5), 1989–2007. doi:10.1007/s10853-023-09229-1

CrossRef Full Text | Google Scholar

Singh, S., Hamid, Z., Babu, R., Gómez-Graña, S., Hu, X., McCulloch, I., et al. (2025). Halide perovskite photocatalysts for clean fuel production and organic synthesis: opportunities and challenges. Adv. Mat. 37, 2419603. doi:10.1002/adma.202419603

PubMed Abstract | CrossRef Full Text | Google Scholar

Song, T.-B., Chen, Q., Zhou, H., Jiang, C., Wang, H.-H., Yang, Y., et al. (2015). Perovskite solar cells: film formation and properties. J. Mat. Chem. A. 3, 9032–9050. doi:10.1039/c4ta05246c

CrossRef Full Text | Google Scholar

Sudha Priyanga, G. A., Manoj, N. M. B., and Nagappan, N. C. (2022). Prediction of nature of band gap of perovskite oxides (ABO3) using a machine learning approach. J. Materiomics. 8 (5), 937–948. doi:10.1016/j.jmat.2022.04.006

CrossRef Full Text | Google Scholar

Sum, T. C., Mathews, N., Xing, G., Lim, S. S., Chong, W. K., Giovanni, D., et al. (2016). Spectral features and charge dynamics of lead halide perovskites: origins and interpretations. Acc. Chem. Res. 49 (2), 294–302. doi:10.1021/acs.accounts.5b00433

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, Q., and Yin, W.-J. (2017). Thermodynamic stability trend of cubic perovskites. J. Am. Chem. Soc. 139 (42), 14905–14908. doi:10.1021/jacs.7b09379

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, T., and Yuan, J.-M. (2023). Band gap prediction of perovskite materials based on transfer learning. Acta Phys. Sin. 72 (21), 218901. (Chinese Journal). doi:10.7498/aps.72.20231027

CrossRef Full Text | Google Scholar

Sun, S., Hartono, N. T. P., Ren, Z. D., Oviedo, F., Buscemi, A. M., Layurova, M., et al. (2019). Accelerated development of perovskite-inspired materials via high-throughput synthesis and machine-learning diagnosis. Joule 3 (6), 1437–1451. doi:10.1016/j.joule.2019.05.014

CrossRef Full Text | Google Scholar

Talapatra, A., Uberuaga, B. P., Stanek, C. R., and Pilania, G. (2021). A machine learning approach for the prediction of formability and thermodynamic stability of single and double perovskite oxides. Chem. Mater. 33 (3), 845–858. doi:10.1021/acs.chemmater.0c03402

CrossRef Full Text | Google Scholar

Tang, C., Luktarhan, N., and Zhao, Y. (2020). SAAE-DNN: deep learning method on intrusion detection. Symmetry 12 (10), 1695. doi:10.3390/sym12101695

CrossRef Full Text | Google Scholar

Tang, G., Ghosez, P., and Hong, J. W. (2021). Band-edge orbital engineering of perovskite semiconductors for optoelectronic applications. J. Phys. Chem. 12 (17), 4227–4239. doi:10.1021/acs.jpclett.0c03816

PubMed Abstract | CrossRef Full Text | Google Scholar

Tao, Q., Xu, P., Li, M., and Lu, W. (2021). Machine learning for perovskite materials design and discovery. Npj Comput. Mater. 7 (1), 23. doi:10.1038/s41524-021-00495-8

CrossRef Full Text | Google Scholar

Tena, F., Garnica, O., Lanchares, J., and Hidalgo, J. I. (2021). A critical review of the state-of-the-art on deep neural networks for blood glucose prediction in patients with diabetes. arXiv. doi:10.48550/arXiv.2109.02178

CrossRef Full Text | Google Scholar

Tharwat, A. (2020). Behavioral analysis of support vector machine classifier with gaussian kernel and imbalanced data. arXiv. doi:10.48550/arXiv.2007.05042

CrossRef Full Text | Google Scholar

Thoppil, G. S., and Alankar, A. (2022). Predicting the formation and stability of oxide perovskites by extracting underlying mechanisms using machine learning. arXiv 211, 111506. doi:10.1016/j.commatsci.2022.111506

CrossRef Full Text | Google Scholar

Tian, L., Bi, E., Yavuz, I., Deger, C., Tian, Y., Zhou, J., et al. (2025). Divalent cation replacement strategy stabilizes wide-bandgap perovskite for Cu(In,Ga)Se2 tandem solar cells. Nat. Phot. 19, 479–485. doi:10.1038/s41566-025-01618-z

CrossRef Full Text | Google Scholar

Touati, S., Benghia, A., Hebboul, Z., Lefkaier, I. K., Kanoun, M. B., and Goumri-Said, S. (2024). Predictive machine learning approaches for perovskites properties using their chemical formula: towards the discovery of stable solar cells materials. Neural comput. Appl. 36, 16319–16329. doi:10.1007/s00521-024-09992-5

CrossRef Full Text | Google Scholar

Turnley, J. W., Agarwal, S., and Agrawal, R. (2024). Rethinking tolerance factor analysis for chalcogenide perovskites. Mat. Horiz. 11 (11), 4802–4808. doi:10.1039/d4mh00689e

PubMed Abstract | CrossRef Full Text | Google Scholar

Ueno, D., Kawabe, H., Yamasaki, S., Demura, T., and Kato, K. (2021). Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana. BMC Bioinforma. 22 (1), 380. doi:10.1186/s12859-021-04291-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Valencia, A., Liu, F., Zhang, X., Bo, X., Li, W., and Daoud, W. A. (2025). Auto-generating a database on the fabrication details of perovskite solar devices. Sci. Data. 12 (1), 270. doi:10.1038/s41597-025-04566-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Vicent-Luna, J. M., Apergi, S., and Tao, S. (2021). Efficient computation of structural and electronic properties of halide perovskites using density functional tight binding: GFN1-xTB method. Chem. Inf. Model 61 (9), 4415–4424. doi:10.1021/acs.jcim.1c00432

PubMed Abstract | CrossRef Full Text | Google Scholar

Wan, Z., Wang, Q.-D., Liu, D., and Liang, J. (2021). Prediction of band gap for 2D hybrid organic–inorganic perovskites by using machine learning through molecular graphics descriptors. New J. Chem. 45 (14), 14694–14704. doi:10.1039/d1nj01518d

CrossRef Full Text | Google Scholar

Wang, X., Ren, L., Zong, H., Wu, C., Qian, J., and Wang, K. (2024). Perovskite single pixel imaging exceeding the visible towards X-ray and THz. J. Mat. Chem. A. 12 (42), 10857–10873. doi:10.1039/d4tc02080d

CrossRef Full Text | Google Scholar

Wang, J., Wang, X., Feng, S., and Miao, Z. (2024). Studying the thermodynamic phase stability of organic–inorganic hybrid perovskites using machine learning. Molecules 29 (13), 2974. doi:10.3390/molecules29132974

PubMed Abstract | CrossRef Full Text | Google Scholar

Ward, L., Dunn, A., Faghaninia, A., Zimmermann, N. E. R., Bajaj, S., Wang, Q., et al. (2018). Matminer: an open source toolkit for materials data mining. Mat. Sci. 152, 60–69. doi:10.1016/j.commatsci.2018.05.018

CrossRef Full Text | Google Scholar

Workman, M., Chen, D. Z., and Musa, S. M. (2020). Machine learning for predicting perovskite solar cell opto-electronic properties. IEEE, 1–5.2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI).

Google Scholar

Wu, T., and Wang, J. (2019). Global discovery of stable and non-toxic hybrid organic-inorganic perovskites for photovoltaic systems by combining machine learning method with first principle calculations. Nano Energy 64, 104070. doi:10.1016/j.nanoen.2019.104070

CrossRef Full Text | Google Scholar

Wu, T., and Wang, J. (2020). Deep mining stable and nontoxic hybrid organic-inorganic perovskites for photovoltaics via progressive machine learning. ACS Appl. Mat. Interfaces. 12 (49), 57821–57831. doi:10.1021/acsami.0c10371

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, J., Chen, S. P., and Liu, X. Y. (2020). Efficient hyperparameter optimization through model-based reinforcement learning. Neurocomputing 409, 381–393. doi:10.1016/j.neucom.2020.06.064

CrossRef Full Text | Google Scholar

Wu, X., Chen, H., Wang, J., and Niu, X. (2023). Machine learning accelerated study of defect energy levels in perovskites. J. Mat. Chem. C. 127 (23), 11387–11395. doi:10.1021/acs.jpcc.3c02493

CrossRef Full Text | Google Scholar

Wu, B., Zhang, X., Wang, Z., Chen, Z., Liu, S., Liu, J., et al. (2024). Data-driven strategy for bandgap database construction of perovskites and the potential segregation study. J. Mat. Inf. 4 (7). doi:10.20517/jmi.2024.10

CrossRef Full Text | Google Scholar

Xie, T., and Grossman, J. C. (2018a). Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. arXiv 120, 145301. doi:10.1103/physrevlett.120.145301

PubMed Abstract | CrossRef Full Text | Google Scholar

Xie, T., and Grossman, J. C. (2018b). Hierarchical visualization of materials space with graph convolutional neural networks. J. Chem. Phys. 149 (17), 174111. doi:10.1063/1.5047803

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, Q., Li, Z., Liu, M., and Yin, W.-J. (2018). Rationalizing perovskite data for machine learning and materials design. J. Phys. Chem. Lett. 9, 6948–6954. doi:10.1021/acs.jpclett.8b03232

PubMed Abstract | CrossRef Full Text | Google Scholar

Xue, J., Fu, H. D., Yang, B., Jiang, L., Zhang, H. T., Wang, W. R., et al. (2025). Interpretable machine learning applications: a promising prospect of AI for materials. Adv. Funct. Mat. 35, 2507734. doi:10.1002/adfm.202507734

CrossRef Full Text | Google Scholar

Yang, X., Li, L., Tao, Q., Lu, W., and Li, M. (2021). Rapid discovery of narrow bandgap oxide double perovskites using machine learning. Comput. Mat. Sci. 196, 110528. doi:10.1016/j.commatsci.2021.110528

CrossRef Full Text | Google Scholar

Yang, J., Manganaris, P., and Mannodi-Kanakkithodi, A. (2023). Discovering novel halide perovskite alloys using multi-fidelity machine learning and genetic algorithm. arXiv. doi:10.48550/arXiv.2310.13153

CrossRef Full Text | Google Scholar

Yang, C., Chong, X., Hu, M., Yu, W., He, J., Zhang, Y., et al. (2023). Accelerating the discovery of hybrid perovskites with targeted band gaps via interpretable machine learning. ACS Appl. Mat. Interfaces. 15 (34), 40419–40427. doi:10.1021/acsami.3c06392

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, W., Zhang, K., Yuan, W., Zhang, L., Qin, C., and Wang, H. (2024). Enhancing stability and performance in tin-based perovskite field-effect transistors through hydrogen bond suppression of organic cation migration. Adv. Mat. 36 (23), 2313461. doi:10.1002/adma.202313461

PubMed Abstract | CrossRef Full Text | Google Scholar

Yao, Y., Han, D., Spooner, K. B., Jia, X., Ebert, H., Scanlon, D. O., et al. (2025). Adapting explainable machine learning to study mechanical properties of 2D hybrid halide perovskites. Adv. Funct. Mat. 35 (24), 2411652. doi:10.1002/adfm.202411652

CrossRef Full Text | Google Scholar

Yoshida, H., Zhang, L., Sato, M., Morikawa, T., Kajino, T., Sekito, T., et al. (2014). Calcium titanate photocatalyst prepared by a flux method for reduction of carbon dioxide with water. Catal. Today 251, 132–139. doi:10.1016/j.cattod.2014.10.039

CrossRef Full Text | Google Scholar

Yu, W., Li, F., Yu, L., Niazi, M. R., Zou, Y., Corzo, D., et al. (2018). Single crystal hybrid perovskite field-effect transistors. Nat. Commun. 9 (1), 5354. doi:10.1038/s41467-018-07706-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhai, S., Xie, H., Cui, P., Guan, D., Wang, J., Zhao, S., et al. (2022). A combined ionic Lewis acid descriptor and machine-learning approach to prediction of efficient oxygen reduction electrodes for ceramic fuel cells. Nat. Energy 7, 866–875. doi:10.1038/s41560-022-01098-3

CrossRef Full Text | Google Scholar

Zhang, S. C. (2022). Challenges in KNN classification. TKDE 34 (10), 4663–4675. doi:10.1109/tkde.2021.3049250

CrossRef Full Text | Google Scholar

Zhang, Y., and Xu, X. (2021). Modeling of lattice parameters of cubic perovskite oxides and halides. Heliyon 7, e07601. doi:10.1016/j.heliyon.2021.e07601

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, S., Lu, T., Xu, P., Tao, Q., Li, M., and Lu, W. (2021). Predicting the formability of hybrid organic−inorganic perovskites via an interpretable machine learning strategy. J. Phys. Chem. Lett. 12, 7423–7430. doi:10.1021/acs.jpclett.1c01939

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, J., Li, Y., and Zhou, X. (2023). Machine-learning prediction of the computed band gaps of double perovskite materials. Chennai, India: AIRCC Publishing Corporation, 15–27.

Google Scholar

Zhang, L., Zhuang, Z., Fang, Q., and Wang, X. (2023). Study on the automatic identification of ABX3 perovskite crystal structure based on the bond-valence vector sum. Materials 16 (1), 334. doi:10.3390/ma16010334

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, J., Zhang, L., Sun, Y., Li, W., and Quhe, R. (2024). Named entity recognition in the perovskite field based on convolutional neural networks and MatBERT. Comput. Mat. Sci. 240, 113014. doi:10.1016/j.commatsci.2024.113014

CrossRef Full Text | Google Scholar

Zhang, W. F., Liu, J., Song, W., Shan, J. H., Guan, H. W., Zhou, J., et al. (2025). Chemical passivation and grain-boundary manipulation via in situ cross-linking strategy for scalable flexible perovskite solar cells. Sci. Adv. 11, eadr2290. doi:10.1126/sciadv.adr2290

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao, J., and Wang, X. Y. (2022). Screening perovskites from ABO3 combinations generated by constraint satisfaction techniques using machine learning. ACS Omega 7 (12), 10483–10491. doi:10.1021/acsomega.2c00002

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao, Y., Zhang, J., Xu, Z., Sun, S., Langner, S., Hartono, N. T. P., et al. (2021). Discovery of temperature-induced stability reversal in perovskites using high-throughput robotic learning. Nat. Commun. 12 (1), 2191. doi:10.1038/s41467-021-22472-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao, R., Xing, B., Mu, H., Fu, Y., and Zhang, L. (2022). Evaluation of performance of machine learning methods in mining structure–property data of halide perovskite materials. Chin. Phys. B 31 (5), 056302. doi:10.1088/1674-1056/ac5d2d

CrossRef Full Text | Google Scholar

Zhao, J., Wang, X., Li, H., and Xu, X. (2024). Interpretable machine learning-assisted screening of perovskite oxides. RSC Adv. 14, 3909–3922. doi:10.1039/d3ra08591k

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhu, X. X., and Hu, B. (2023). Analyzing excited state dynamics of organic and perovskite materials using magneto-optical-electrical comprehensive methods: achieving interdisciplinary research and cross-disciplinary collaboration. Chin. J. Lumin. 44 (7), 1287–1299. doi:10.37188/CJL.20230142

CrossRef Full Text | Google Scholar

Zhu, Y., Zhang, J., Qu, Z., Jiang, S., Liu, Y., Wu, Z., et al. (2024). Accelerating stability of ABX3 perovskites analysis with machine learning. Ceram. Int. 50 (4), 6250–6258. doi:10.1016/j.ceramint.2023.11.349

CrossRef Full Text | Google Scholar

Keywords: artificial intelligence, lead-free perovskite materials, data-driven design, performance prediction, interdisciplinary collaboration

Citation: Wang B, Wang J, Li L and Wang D (2026) Data-driven AI approaches for screening high-efficiency, stable, and lead-free perovskite photovoltaic materials: a review. Front. Mater. 12:1669229. doi: 10.3389/fmats.2025.1669229

Received: 25 July 2025; Accepted: 21 November 2025;
Published: 06 January 2026.

Edited by:

Simone Taioli, European Centre for Theoretical Studies in Nuclear Physics and Related Areas (ECT*), Italy

Reviewed by:

Andrea Pedrielli, Bruno Kessler Foundation (FBK), Italy
Tommaso Morresi, European Centre for Theoretical Studies in Nuclear Physics and Related Areas (ECT*), Italy

Copyright © 2026 Wang, Wang, Li and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Juan Wang, d2FuZ2p1YW5AeGlqaW5nLmVkdS5jbg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.