Neutrosophic regression type estimator for the finite population mean and its applications in real data scenarios

Purwar, Neha; Aditya, Kaustav; B,; Das, Pankaj; Ahmad, Tauqueer

doi:10.3389/fams.2025.1658157

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 17 September 2025

Sec. Statistics and Probability

Volume 11 - 2025 | https://doi.org/10.3389/fams.2025.1658157

Neutrosophic regression type estimator for the finite population mean and its applications in real data scenarios

Neha Purwar¹

Kaustav Aditya²^*

Bharti²

Pankaj Das²

Tauqueer Ahmad²

¹The Graduate School, ICAR-IARI, New Delhi, India
²ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India

Research under classical statistics often relies on precise, determinate data to estimate population parameters. However, in certain situations, data may be indeterminate or imprecise. Neutrosophic statistics, a generalization of classical statistics, has been introduced to address these challenges by handling vague, indeterminate, and uncertain information effectively. Several estimators, including ratio estimators, have been proposed in neutrosophic statistics. These ratio estimators perform well when the correlation between the auxiliary and study variables is strong. However, in this study, regression-type estimators were developed, demonstrating superior performance in cases where the correlation between the study and auxiliary variables is high, weak, or moderate. The performance of the proposed estimator was evaluated using simulated data as well as four real-world datasets with indeterminate data, including blood pressure, temperature, natural growth rate, and solar energy data. The proposed neutrosophic regression estimator consistently outperformed the existing neutrosophic ratio estimator, modified neutrosophic ratio estimators, and the neutrosophic exponential ratio estimator, as indicated by performance measures such as mean squared error (MSE) and percent relative efficiency (PRE). This paper highlights the advantages of the neutrosophic regression estimator in improving estimation accuracy when dealing with uncertain and ambiguous data, with any range of correlation between the study and the auxiliary variables considered under the study.

1 Introduction

In surveys, a representative sample is drawn from the population using an appropriate sampling design to make inferences about population parameters. Classical statistics typically relies on precise, determinate data to estimate these parameters. However, real-world data often exhibit imprecision, indeterminacy, or vagueness due to various factors such as measurement errors, incomplete information, or inherent variability. For instance, in agriculture surveys, crop yield predictions are uncertain due to fluctuating weather patterns and pest infestations. In demographic studies, estimating birth rates and natural growth rates can involve uncertainty due to factors such as incomplete census data, reporting errors, and variations in survey methodologies [1]. Classical statistical methods provide point estimates but often struggle to adequately address such indeterminate data, potentially leading to biased or less robust estimates. To overcome this limitation, the concept of neutrosophic statistics was introduced by Prof. Dr. Florentin Smarandache from the University of New Mexico, United States, in 1998 [2], which offers a powerful generalization of classical statistics. Neutrosophic statistics is designed to handle data characterized by uncertainty, indeterminacy, and ambiguity by representing observations as intervals or sets rather than single precise values. Unlike classical statistics, which assumes deterministic data, neutrosophic statistics incorporates three key components: truth-membership (representing the degree of certainty), indeterminacy-membership (quantifying the level of uncertainty or ambiguity), and falsity-membership (indicating the degree of contradiction). This framework allows neutrosophic statistics to model complex, real-world scenarios more effectively by accounting for the full spectrum of uncertainty, including situations where data are partially known or contradictory. For example, in health studies, blood pressure measurements may vary within a range due to instrument precision or patient conditions, and neutrosophic statistics can represent these measurements as intervals, providing a more comprehensive analysis than classical point estimates. Compared to fuzzy statistics, which focus on ambiguity through membership degrees between 0 and 1, neutrosophic statistics explicitly address indeterminacy, making them particularly suitable for scenarios where data uncertainty cannot be fully resolved by fuzzy sets [3, 4]. This capability enhances the robustness and flexibility of neutrosophic methods in handling imprecise data across diverse fields such as agriculture, health, environmental studies, and social sciences.

Neutrosophic probability distributions extend classical distributions by incorporating imprecise parameters to handle indeterminate data. Neutrosophic logic allows these distributions to model uncertainty more effectively than traditional methods [5], and further, many researchers contribute to neutrosophic statistics using a variety of estimating techniques [6–18].

In sample surveys, in addition to data on the study variable, information on several auxiliary variables is often available. The use of auxiliary information for estimating parameters like the population mean, ratio, product of two means, and coefficient of variation is well established under the classical framework, with key estimators including ratio, product, and regression estimators. Numerous researchers have contributed to sample surveys incorporating auxiliary information, employing various transformation techniques to develop efficient ratio and regression estimators under the classical framework. Research indicates that when the study variable and auxiliary variable are highly correlated, the sampling error in classical ratio estimators is significantly reduced compared to using the study variable alone. A modified ratio estimator considers the subsidiary information's coefficient of variation [19]. The use of transformed auxiliary variables to estimate population means has also been investigated under the classical framework [20–22]. The performance of ratio-type estimators improved when incorporating various types of auxiliary information under the classical framework [23]. The classical regression estimator performs better than the classical ratio estimator regardless of whether there is positive or negative correlation between the study and the auxiliary variables [24]. Introduction of a new chain ratio-type estimator and regression-type estimator to the finite population mean based on a linear combination of two auxiliary variables [25]. Traditional ratio and regression estimators provide precise estimates for the population parameter within a deterministic sampling framework. However, they may not be appropriate for a neutrosophic framework, which includes indeterminacy or ambiguity in data.

Tahir et al. [26] pioneered the estimation of population parameters within a neutrosophic framework using non-linear estimators. They introduced neutrosophic ratio-type estimators and neutrosophic exponential estimators for estimating the population mean under simple random sampling without replacement, particularly in cases where there is a strong correlation between the auxiliary and study variables. Additionally, a study on neutrosophic exponential estimators for the estimation of population means has shown them to perform better in cases where the correlation between these variables is weak or moderate [27]. In addition to this, Alqudah et al. [28] proposed a generalized neutrosophic robust ratio estimator for estimating the finite population mean, specifically designed to handle indeterminate, imprecise, and outlier-contaminated data. Singh et al. [29] proposed an Almost Unbiased Estimator for estimating the population mean to handle neutrosophic data using auxiliary information and the ratio estimator. Yadav et al. [30] proposed neutrosophic mean estimators using extremely indeterminate observations in sample surveys. Despite these advancements, existing neutrosophic estimators often rely on ratio-based methods, which may not perform optimally across a wide range of correlation levels, such as weak, moderate, strong, or even negative.

1.1 Novelty and contributions

This study introduces a novel neutrosophic regression estimator designed to estimate the finite population mean in the presence of indeterminate, imprecise, or vague data. Unlike existing neutrosophic ratio and exponential estimators, which are primarily effective under strong positive correlations, the proposed estimator leverages the regression framework to perform robustly across a wide range of correlation levels, including weak, moderate, strong positive, and negative correlations. By extending classical regression principles to the neutrosophic domain, the estimator accommodates interval-based or uncertain data, providing more reliable estimates in real-world scenarios where data precision is compromised, such as in agricultural yield predictions or health metrics with measurement variability. The study also introduces the R-package “neutroSurvey” [31], which facilitates the practical implementation of neutrosophic statistical methods, making them accessible to researchers and practitioners. The performance of the proposed estimator is rigorously evaluated using four real-world datasets (blood pressure, temperature, natural growth rate, and solar energy) and simulated data, demonstrating its superiority over existing neutrosophic ratio and exponential estimators in terms of mean squared error (MSE) and percent relative efficiency (PRE).

1.2 Significance of the study

The significance of this study lies in its development of a versatile and robust neutrosophic regression estimator that addresses the limitations of classical and existing neutrosophic estimators (Figure 1). By effectively handling indeterminate data across various correlation structures, the proposed estimator enhances estimation accuracy in fields where uncertainty is prevalent, such as agriculture, health, and environmental studies, and so on, by providing a framework that explicitly accounts for truth, indeterminacy, and falsity. The estimator's ability to perform well across a wide range of correlation levels—weak, moderate, or strong positive and negative correlations—makes it a valuable tool for complex, real-world applications where traditional methods may fail. Furthermore, the introduction of the “neutroSurvey” R-package democratizes access to neutrosophic statistical methods, enabling researchers to apply these techniques in diverse domains. This study also lays the groundwork for future research into advanced neutrosophic estimators for complex sampling designs and multivariate frameworks, potentially integrating robust and machine learning-based approaches to further enhance estimation precision in uncertain environments.

Figure 1

Flowchart illustrating statistical inference, divided into estimation and hypothesis testing. Estimation uses deterministic data with study and auxiliary variables that are correlated. Hypothesis testing applies simple random sampling and neutrosophic data, leading to study and auxiliary variables. High correlation leads to neutrosophic ratio type estimation, while varying correlation levels proceed to neutrosophic regression type estimation.

Figure 1. Flow chart of neutrosophic inference.

2 Methodology

2.1 Neutrosophic observation

Neutrosophic numbers can be represented in multiple ways. However, in this study, neutrosophic interval values were defined as

\begin{array}{l} Z_{N} = Z_{L} + Z_{U} I_{N} with I_{N} \in [I_{L}, I_{U}], Z_{N} \in [a, b], \end{array}

where

Z_L indicates the deterministic (lower) part of the neutrosophic number, representing a certain or known component of the data.

Z_U indicates the coefficient of the indeterminate part, which quantifies the magnitude of indeterminacy.

I_N indicates an indeterminacy component, which lies within the interval I_N ∈ [I_L, I_U ], where I_L is the lower bound, and I_U is the upper bound of the indeterminacy interval.

The neutrosophic number is expressed as an interval Z_N ∈ [a, b]

where:

a is the lower bound of the neutrosophic interval, representing the minimum possible value.

b is the upper bound of the neutrosophic interval, representing the maximum possible value.

The expression Z_N = Z_L+Z_UI_N with I_N ∈ [0, 1] and Z_N ∈ [a, b] represents a neutrosophic number as an interval [ Z_N, Z_L+Z_U] = [a, b] where Z_U = b−a. When ( I_N = 0), this condition refers to a specific value of the indeterminacy component. However, since I_N is an interval [I_L, I_U ], this can be interpreted as the case where the indeterminacy interval collapses to a crisp value, specifically, or the indeterminacy component is evaluated at zero. In this situation, Z_N = Z_L = a, where a is the lower bound of the neutrosophic interval. This reduction aligns neutrosophic statistics with classical statistics, where all data are determinate [4].

Let (T=T₁, T₂, …,T_N) be a population of N units a random sample of size n, which is drawn from a finite population of N units by simple random sampling without replacement (SRSWOR). Let Y_N(i) ∈ (Y_L,Y_U) be i^th unit of neutrosophic population on variable of interest Y_N (study variable), and X_N(i) ∈ (X_L,X_U) is i^th unit of neutrosophic population on auxiliary variable, which is correlated to neutrosophic study variable Y_N and y_N(i) and x_N(i) is i^th unit of sample observation of the neutrosophic study and auxiliary variable, respectively. Let ${\bar{Y}}_{N} \in ({\bar{Y}}_{L}, {\bar{Y}}_{U})$ and ${\bar{X}}_{N} \in ({\bar{X}}_{L}, {\bar{X}}_{U})$ be the neutrosophic population mean for study variable Y_N and auxiliary variable X_N, respectively, and ${\bar{y}}_{N} \in ({\bar{y}}_{L}, {\bar{y}}_{U})$ and ${\bar{x}}_{N} \in ({\bar{x}}_{L}, {\bar{x}}_{U})$ be the neutrosophic sample mean for study variable Y_N and auxiliary variable X_N, respectively. Let C_yN ∈ (C_yL,C_yU) and C_xN ∈ (C_xL,C_xU) be neutrosophic coefficients of variation for Y_N and X_N, respectively. ρ_yxN ∈ (ρ_yxL,ρ_yxU) is the neutrosophic correlation between Y_N and X_N (neutrosophic variables). Let ${\bar{x}}_{N} \in ({\bar{x}}_{L}, {\bar{x}}_{U})$ and $S_{x N}^{2} \in (S_{x L}^{2}, S_{x U}^{2})$ be the neutrosophic population variances for X_N and Y_N (neutrosophic variables), respectively, S_yxN ∈ (S_yxL,S_yxU) be the neutrosophic covariance between X_N and Y_N (neutrosophic variables). In addition, β_2(x)N ∈ (β_2(x)L, β_2(x)U) is the neutrosophic coefficient of kurtosis for auxiliary variable X_N. Let e_yL ∈ (e_yL,e_yU) and e_xN ∈ (e_xL,e_xU) be the neutrosophic mean errors for Y_N and X_N, respectively . Let $e_{s_{x N}^{2}} \in (e_{s_{x L}^{2}}, e_{s_{x U}^{2}})$ and $e_{s_{y N}^{2}} \in (e_{s_{y L}^{2}}, e_{s_{y U}^{2}})$ be neutrosophic error terms for variance and e_{s_yxN} ∈ (e_{s_yxL},e_{s_yxU}) is neutrosophic error term for covariance. These terms are defined in Table 1.

Table 1

Table 1. Terminological framework for neutrosophic estimators.

2.2 Flow chart

The flowchart below illustrates the process for applying the proposed methods to neutrosophic data.

2.3 Existing neutrosophic estimators

i. Tahir et al. [26] proposed the neutrosophic ratio estimator for estimating the mean of the finite population in the presence of an auxiliary variable denoted by ${\bar{y}}_{R N}$ is (Equation 1) given

\begin{array}{l} {\bar{y}}_{R N} = \frac{{\bar{y}}_{N}}{{\bar{x}}_{N}} {\bar{X}}_{N}, & (1) \end{array}

where ${\bar{y}}_{R N} \in ({\bar{y}}_{R N L}, {\bar{y}}_{R N U})$

Bias $({\bar{y}}_{R N})$ and MSE $({\bar{y}}_{R N})$ up to the first-order approximation were

Bias $({\bar{y}}_{R N}) ≅ θ_{N} {\bar{Y}}_{N} (C_{x N}^{2} - ρ_{y x N} C_{y N} C_{x N}$ )

MSE $({\bar{y}}_{R N}) ≅ E $ {({\bar{y}}_{R N} - {\bar{Y}}_{N})}^{2} ≅ θ_{N} {\bar{Y}}_{N}^{2} (C_{y N}^{2} + C_{x N}^{2} - 2 ρ_{y x N} C_{y N} C_{x N})$ .

ii. The neutrosophic ratio-type estimator ${\bar{y}}_{R N 1}$ that considers the coefficient of variation as an auxiliary variable proposed by Tahir et al. [26] was as follows:

\begin{array}{l} {\bar{y}}_{R N_{1}} = {\bar{y}}_{N} \frac{{\bar{x}}_{N} + C_{x N}}{{\bar{x}}_{N} + C_{x N}}, & (2) \end{array}

where ${\bar{y}}_{R N_{1}} \in ({\bar{y}}_{R N L_{1}}, {\bar{y}}_{R N U_{1}})$

Bias $({\bar{y}}_{R N 1})$ and MSE $({\bar{y}}_{R N 1})$ up to first order approximation were

${\bar{y}}_{R N_{1}} = {\bar{Y}}_{N} [(1 + e_{y N}) (\frac{{\bar{X}}_{N +} C_{x N}}{{\bar{X}}_{N} (1 + e_{x N}) + C_{x N}})]$

Bias $({\bar{y}}_{R N 1}) ≅ θ_{N} {\bar{Y}}_{N} [{(\frac{{\bar{X}}_{N}}{{\bar{X}}_{N} + C_{x N}})}^{2} C_{x N}^{2} - \frac{{\bar{X}}_{N}}{{\bar{X}}_{N} + C_{x N}} ρ_{y x N} C_{y N} C_{x N}]$

MSE

$({\bar{y}}_{R N 1}) = E {({\bar{y}}_{R N_{1}} - {\bar{Y}}_{N})}^{2} ≅ θ_{N} {\bar{Y}}_{N}^{2} [C_{y N}^{2} + {(\frac{{\bar{X}}_{N}}{{\bar{X}}_{N} + C_{x N}})}^{2} C_{x N}^{2} - 2 (\frac{{\bar{X}}_{N}}{{\bar{X}}_{N} + C_{x N}}) ρ_{y x N} C_{y N} C_{x N}]$ .

iii. The neutrosophic ratio-type estimator ${\bar{y}}_{R N 1}$ that considers the coefficient of kurtosis as an auxiliary variable proposed by Tahir et al. [26] is given below:

\begin{array}{l} {\bar{y}}_{R N_{2}} = {\bar{y}}_{N} \frac{{\bar{X}}_{N} + β_{2 (x) N}}{{\bar{x}}_{N +} β_{2 (x) N}}, & (3) \end{array}

where ${\bar{y}}_{R N_{2}} \in ({\bar{y}}_{R N L_{2}}, {\bar{y}}_{R N U_{2}})$

Bias ( ${\bar{y}}_{R N_{2}}) a n d M S E ({\bar{y}}_{R N_{2}}$ ) can be expressed as follows:

\begin{array}{l} B i a s ({\bar{y}}_{R N_{2}}) ≅ θ_{N} {\bar{Y}}_{N} [{(\frac{{\bar{X}}_{N}}{{\bar{X}}_{N} + β_{2 (x) N}})}^{2} C_{x N}^{2} \\ - (\frac{{\bar{X}}_{N}}{{\bar{X}}_{N} + β_{2 (x) N}}) ρ_{y x N} C_{y N} C_{x N}] \\ M S E ({\bar{y}}_{R N_{2}}) ≅ θ_{N} {\bar{Y}}_{N}^{2} [C_{y N}^{2} + {(\frac{{\bar{X}}_{N}}{{\bar{X}}_{N} + β_{2 (x) N}})}^{2} C_{x N}^{2} \\ - 2 (\frac{{\bar{X}}_{N}}{{\bar{X}}_{N} + β_{2 (x) N}}) ρ_{y x N} C_{y N} C_{x N}] . \end{array}

iv. In this sequence, Tahir et al. [26] proposed a neutrosophic ratio estimator ${\bar{y}}_{R N_{3}}$ incorporating both the coefficient of variation C_xN and the coefficient of kurtosis β_2(x)N as auxiliary variables given as follows:

\begin{array}{l} {\bar{y}}_{R N_{3}} = {\bar{y}}_{N} \frac{{\bar{X}}_{N} β_{2 (x) N} + C_{x N}}{{\bar{x}}_{N} β_{2 (x) N} + C_{x N}}, & (4) \end{array}

where ${\bar{y}}_{R N_{3}} \in ({\bar{y}}_{R N L_{3}}, {\bar{y}}_{R N U_{3}})$

Bias $({\bar{y}}_{R N_{3}})$ and MSE $({\bar{y}}_{R N_{3}})$ up to first order approximation were as

${\bar{y}}_{R N_{3}}$ $= {\bar{Y}}_{N} (1 + e_{y N}) (\frac{{\bar{X}}_{N} β_{2 (x) N} + C_{x N}}{{\bar{X}}_{N} (1 + e_{x N}) β_{2 (x) N} + C_{x N}})$

Bias $({\bar{y}}_{R N_{3}}) ≅ {\bar{Y}}_{N} θ_{N} [{(\frac{{\bar{X}}_{N} β_{2 (x) N}}{{\bar{X}}_{N} β_{2 (x) N} + C_{x N}})}^{2} C_{x N}^{2}$

$- 2 (\frac{{\bar{X}}_{N} β_{2 (x) N}}{{\bar{X}}_{N} β_{2 (x) N} + C_{x N}}) ρ_{x y N} C_{y N} C_{x N}]$

MSE $({\bar{y}}_{R N_{3}}) =$ E ${({\bar{y}}_{R N_{3}} - {\bar{Y}}_{N})}^{2}$

$≅ θ_{N} {\bar{Y}}_{N}^{2} (C_{y N}^{2} + {(\frac{{\bar{X}}_{N} β_{2 (x) N}}{{\bar{X}}_{N} β_{2 (x) N} + C_{x N}})}^{2} C_{x N}^{2} -$ $2 (\frac{{\bar{X}}_{N} β_{2 (x) N}}{{\bar{X}}_{N} β_{2 (x) N} + C_{x N}}) ρ_{x y N} C_{y N} C_{x N})$ .

v. The neutrosophic exponential estimator developed by Tahir et al. [26] is given below:

\begin{array}{l} {\bar{y}}_{R N_{E}} = {\bar{y}}_{N} exp (\frac{{\bar{X}}_{N} - {\bar{x}}_{N}}{{\bar{X}}_{N} + {\bar{x}}_{N}}), & (5) \end{array}

where ${\bar{y}}_{R N_{E}} \in ({\bar{y}}_{R N L_{E}}, {\bar{y}}_{R N U_{E}})$

Bias $({\bar{y}}_{R N_{E}})$ and MSE $({\bar{y}}_{R N_{E}})$ up to first order approximation were as,

${\bar{y}}_{R N_{E}} = {\bar{Y}}_{N} (1 + e_{y N}) exp (\frac{{\bar{X}}_{N} - {\bar{X}}_{N} (1 + e_{x N})}{{\bar{X}}_{N} + {\bar{X}}_{N} (1 + e_{x N})})$

$B i a s ({\bar{y}}_{R N_{E}}) ≅ θ_{N} {\bar{Y}}_{N} (\frac{{3 C^{2}}_{x N}}{8} - \frac{ρ_{x y N} C_{y N} C_{x N}}{2})$

MSE $({\bar{y}}_{R N_{E}}) ≅ E {({\bar{y}}_{R N_{E}} - {\bar{Y}}_{N})}^{2} ≅ θ_{N} {\bar{Y}}_{N}^{2} ({C^{2}}_{y N} + \frac{{C^{2}}_{x N}}{4} - ρ_{x y N} C_{y N} C_{x N})$ .

Derivation of bias and MSE of all the above estimators are given in Appendix A.

3 Proposed neutrosophic regression estimator for the estimation of a finite population parameter

The proposed neutrosophic regression estimator for the estimation of a finite population parameter is given as follows:

\begin{array}{l} {\bar{y}}_{N R e g} = {\bar{y}}_{N} - b_{N} ({\bar{x}}_{N} - {\bar{X}}_{N}), & (6) \end{array}

where $, {\bar{y}}_{N R e g} \in ({\bar{y}}_{N R e g_{L}}, {\bar{y}}_{N R e g_{U}})$ ; b_N ∈ (b_L, b_U) be the sample regression coefficient, which is unknown.

To obtain bias, taking the expectation on both sides of Equation 6, we obtain

\begin{array}{r} E ({\bar{y}}_{N R e g} - {\bar{y}}_{N}) = - [E (b_{N} {\bar{x}}_{N}) - {\bar{X}}_{N} E (b_{N})] \\ = - [E (b_{N} {\bar{x}}_{N}) - E (x_{N}) E (b_{N})] . \end{array}

Thus, Bias $({\bar{y}}_{N R e g})$ is as follows:

Bias $({\bar{y}}_{N R e g}) = -$ covariance $(b_{N} {\bar{x}}_{N})$

To obtain the MSE for the proposed neutrosophic regression estimator $({\bar{y}}_{R e g})$ up to first order approximation, neutrosophic errors given in Table 1 were substituted into Equation 6, and applying Taylor series expansion, we obtain

${\bar{y}}_{N R e g} = {\bar{Y}}_{N} (1 + e_{y N}) - B_{N} (\frac{1 + e_{s_{y x N}}}{1 + e_{s_{x N}^{2}}}) {\bar{X}}_{N} e_{x N}$ ,

where B_N is a constant known as the population regression coefficient

${\bar{y}}_{N R e g} - {\bar{Y}}_{N} ≅ {\bar{Y}}_{N} e_{y N} - B_{N} {\bar{X}}_{N} e_{x N}$

E $({\bar{y}}_{N R e g} - {\bar{Y}}_{N}) = 0$

Now, MSE of the proposed estimator $({\bar{y}}_{N R e g})$ can be obtained as follows:

$M S E ({\bar{y}}_{N R e g}) = E {({\bar{y}}_{N R e g} - {\bar{Y}}_{N})}^{2}$

${= E ({\bar{Y}}_{N} e_{y N} - B_{N} {\bar{X}}_{N} e_{x N})}^{2}$

$= E ({\bar{Y}}_{N}^{2} e_{y N}^{2} + {\bar{X}}_{N}^{2} B_{N}^{2} e_{x N}^{2} - 2 {\bar{Y}}_{N} B_{N} {\bar{X}}_{N} e_{y N} e_{x N})$

\begin{array}{l} M S E ({\bar{y}}_{N R e g}) = θ_{N} (S_{y N}^{2} + s_{x N}^{2} B_{N}^{2} - 2 B_{N} ρ_{y x N} S_{y N} S_{x N}) . & (7) \end{array}

On differentiating Equation 7 with respect to B_N and setting it equal to zero, we obtain the following:

\begin{array}{l} B_{N} = ρ_{y x N} \frac{S_{y N}}{S_{x N}} . & (8) \end{array}

Then, on putting the value of B_N from Equation 8 into Equation 7, we obtain

\begin{array}{l} {M S E_{min} ({\bar{y}}_{N R e g}) = θ}_{N} S_{y N}^{2} (1 - ρ_{x y N}^{2}) . & (9) \end{array}

We can write the Equation 9 in terms of the coefficient of variance as below:

\begin{array}{l} M S E_{min} ({\bar{y}}_{N R e g}) = θ_{N} {\bar{Y}}_{N}^{2} C_{y N}^{2} (1 {- ρ}_{x y N}^{2}) . & (10) \end{array}

3.1 Performance measures

The proposed neutrosophic regression estimator was compared with existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}$ , and ${\bar{y}}_{R N_{E}}$ using performance measures: mean squared error (MSE) and percent relative efficiency (PRE).

Estimator T(say) with a percent relative efficiency value less than or equal to 100 as compared to other estimators T₁, is considered the most efficient and is given by

\begin{array}{l} PercentRelativeEfficiency [T_{1}, T] = \frac{M S E (T_{1})}{M S E (T)} . \end{array}

4 Results and discussion

4.1 Evaluation of proposed neutrosophic regression estimator using real datasets

4.1.1 Description of the datasets

The proposed neutrosophic regression-type estimator is a novel concept with limited existing literature. However, in this study, the proposed estimator was compared with other existing neutrosophic ratio estimators, as presented in Equations 1–5. For the empirical evaluation, four real datasets characterized by indeterminacy were selected. The details of each data set are provided below:

Dataset 1: Indeterminate blood pressure data from a population of 82 individuals, comprising 41 men and 41 women, for the years 1975 to 2015, sourced from Japan (https://ncdrisc.org/index). This indeterminate dataset includes five neutrosophic-type variables year-wise. However, in this study, the number of adults with raised blood pressure, (Y_N) and the age-standardized prevalence of raised blood pressure 95% uncertainty interval, (X_N) were considered as study variables and auxiliary variables, with a population size N= 82, the same for the lower and upper bounds under the neutrosophic framework, respectively.

Dataset 2: Data set from Seasonal and Annual Minimum-Maximum Temperature Series (1901–2017) sourced from https://data.gov.in/resource/seasonal-and-annual-minimum-maximum-temperature-series-1901-2017. Temperatures from March to May (Y_N) and Minimum and maximum temperatures in January and February (X_N) with a population size N=117, which is the same for lower and upper bounds under the neutrosophic framework, were considered as the study variable and auxiliary variable, respectively.

Dataset 3: Natural growth rate data from SRS Bulletin 2020 (1) for 21 Bigger States, 9 Smaller States, and 6 Union Territories with a total population size of 36. Natural growth rate was considered as a neutrosophic study variable (Y_N), and birth rate as a neutrosophic auxiliary variable (X_N) with population size N = 36, the same for both lower and upper bounds under the neutrosophic framework.

Dataset 4: Indeterminate solar energy data from Aslam and Algarni (2020) [18]. This dataset consists of ten neutrosophic variables recorded over 12 months, from mid-June 2013 to mid-June 2014. Here, next-day Global Horizontal Irradiance (ND-GHI) was considered as a neutrosophic study variable (Y_N), and temperature as the neutrosophic auxiliary variable (X_N) with a population size N= 12, which is the same for lower and upper bound under the neutrosophic framework.

4.1.2 Descriptive statistics of datasets

To explore the efficiency of the proposed neutrosophic regression estimator for four real-world datasets, descriptive statistics for each dataset were computed under neutrosophic and classical frameworks and are presented in Table 2. The descriptive analysis of all the mentioned data sets under a neutrosophic framework was evaluated using our developed R-package “neutroSurvey” given below (https://CRAN.Rproject.org/package=neutroSurvey). In dataset:1 (Blood pressure), the neutrosophic mean ${\bar{Y}}_{N}$ for study variable (Y_N) was found within the interval [11913108, 16490350], while the classical mean is 11,913,108 and neutrosophic mean ${\bar{X}}_{N}$ for auxiliary variable (X_N) was found within the interval [0.2121, 0.2854], with a classical value of 0.2121. Additionally, the neutrosophic coefficients of variation for C_yN and C_xN for study and auxiliary variables were found and ranged from [0.2889, 0.4048] and [0.3386, 0.5025], respectively, with corresponding classical values of 0.1107 and 0.2967. The neutrosophic correlation coefficient ρ_xyN, between the study and auxiliary variable was found within the interval [0.3932, 0.5503], with a classical value of 0.5502, while the coefficient of kurtosis for the auxiliary variable β_2(x)N, was obtained within the interval [2.115907, 2.239048], with a classical value of 2.1160.

Table 2

Table 2. Descriptive statistics for all four neutrosophic data sets in both neutrosophic and classical frameworks.

For dataset 2, (temperature) the neutrosophic mean ${\bar{Y}}_{N}$ for the study variable was estimated to lie within the interval [20.6685, 31.5176], with a classical mean value of 20.6685. The neutrosophic mean ${\bar{X}}_{N}$ for the auxiliary variable was obtained within the interval [13.8946, 24.6295], with a classical mean value of 13.8946. Furthermore, the neutrosophic coefficients of variation C_yN for the study variable and coefficients of variation C_xN for the auxiliary variable were determined and found in intervals [0.3447, 0.5261] and [0.4364, 0.7753], respectively, with corresponding classical values of 0.0249 and 0.0407. The neutrosophic correlation coefficient between the study and auxiliary variables, ρ_xyN, was found in the interval [0.6126, 0.6759] with a classical value of 0.6126. Additionally, the neutrosophic coefficient of kurtosis for the auxiliary variable, β_2(x)N, was evaluated and found within the interval [5.2403, 6.0507] with a classical value of 5.2404. For dataset 3, (natural growth rate), the neutrosophic mean ${\bar{Y}}_{N}$ for the study variable was found to lie within the interval [9.7583, 11.7667] with a classical mean value of 9.7583, and the neutrosophic mean ${\bar{X}}_{N}$ for the auxiliary variable was obtained and found within the interval [14.7083, 17.8167] with a classical value of 14.7083. The neutrosophic coefficients of variation C_yN for the study variable and C_xN for the auxiliary variable were calculated to lie within the intervals [0.3360, 0.4886] and [0.2531, 0.3607], respectively, with corresponding classical values of 0.3489 and 0.2222. The neutrosophic correlation coefficient between the study and auxiliary variables, ρ_xyN, was estimated and found within the interval [0.9652, 0.9585] with a classical value of 0.9652. Additionally, the coefficient of kurtosis for the auxiliary variable, β_2(x)N, was evaluated and found to lie within the interval [2.5910, 2.2163] with a classical value of 2.5910, and for dataset 4, (solar energy), the neutrosophic mean ${\bar{Y}}_{N}$ and ${\bar{X}}_{N}$ for study variable as well as the auxiliary variable were evaluated to lie within the intervals [5218.00, 6185.92] and [27.20, 29.01], respectively, with corresponding classical mean values of 5218.00 and 27.20. The neutrosophic coefficients of variance C_yN and C_xN for study and auxiliary variables were obtained and found within the interval [0.2712, 0.3290] and [0.5108, 0.8972], respectively, with corresponding classical values of 0.2626 and 0.2995. Neutrosophic correlation coefficient, ρ_xy, between the study and auxiliary variable ranged from [0.8216, 0.7391] with a classical value of 0.8216, and the coefficient of kurtosis for auxiliary variable β_2(x)N was estimated within the interval [1.5174, 1.7496] with a classical value of 1.5174.

4.1.3 Comparison of proposed estimators with existing estimators

The performance of the proposed neutrosophic regression estimator ${\bar{y}}_{N R e g}$ was compared with existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}, {\bar{y}}_{R N_{E}}$ based on their MSE and PRE under the neutrosophic and classical frameworks. The sample size n was drawn by simple random sampling without replacement (SRSWOR). MSE of all estimators for comparison to these datasets is presented in Tables 3–6. The analysis was conducted using the developed R-package “neutroSurvey” (https://CRAN.R-project.org/package=neutroSurvey). The package was developed for empirical analysis of the proposed neutrosophic regression estimator against the above-mentioned existing estimators.

Table 3

Table 3. Comparison of MSE for existing vs. proposed estimators using dataset 1.

For dataset 1, sample sizes n of 17, 21, and 25 were selected from the population of size N= 82 with SRSWOR. Samples of sizes 17, 21, and 25 correspond to approximately 20%, 25%, and 30% of the total population size N= 82, respectively.

The perusal of Table 3 presented that the MSEs at sample size n=17, for estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}} {\bar{y}}_{R N_{E}}$ , and ${\bar{y}}_{N R e g}$ were evaluated and found within the intervals [5.9847, 3.2511], [3.9036, 1.7627], [4.9357, 1.8889], [3.9261, 1.9452], [3.8567, 1.8637], and [3.8496, 1.7561], respectively, with corresponding classical MSEs of 4.2459, 0.8258, 0.6410, 1.4820, 1.0714, and 0.5651. These results indicate that the MSE of the proposed neutrosophic regression estimator is found to be lower than all existing estimators in both neutrosophic and classical approaches. It is closely followed by the neutrosophic exponential estimator, with the existing neutrosophic ratio-type estimators performing comparatively less efficiently. As the sample size n increased from 17 to 25, all estimators exhibited a notable reduction in MSEs, highlighting improved estimation precision with larger sample sizes. In this way, with sample size n= 25, MSEs of estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}, {\bar{y}}_{R N_{E}},$ and ${\bar{y}}_{N R e g}$ were determined to be in the ranges [3.5687, 1.9386], [2.3277, 1.0511], [2.9432, 1.1264], [2.3412, 1.1599], [2.2956, 1.1113], and [2.2956, 1.0472], respectively, with corresponding classical MSEs of 2.5312, 0.4925, 0.3822, 0.8837, 0.6389, and 0.3369. This trend of declining MSEs with increasing the sample size confirms the estimator's efficiency in both neutrosophic and classical approaches. The neutrosophic framework, by accounting for indeterminacy in blood pressure data, provides interval-based estimates, in contrast to the precise point estimates of the classical framework for the mean-squared errors of estimators. This highlights its ability to effectively handle uncertainty while maintaining high accuracy, thereby enhancing its applicability. Furthermore, the MSEs from the classical framework across various sample sizes fall within the corresponding neutrosophic MSE intervals for all estimators, indicating that the neutrosophic estimation procedure has broader applicability.

For dataset 2, sample sizes equal to 15, 20, and 25 were selected from a population of size N= 117 using simple random sampling without replacement (SRSWOR), representing approximately 13%, 17% and 21% of the population size N= 117, respectively.

Upon the assessment of Table 4, it was found that MSEs for estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}, {\bar{y}}_{R N_{E},}$ and ${\bar{y}}_{N R e g}$ with sample size n= 15 were estimated to lie within the intervals [3.1026, 18.8513], [2.9583, 17.7368], [2.1199, 12.7902], [3.0737, 18.6578], [1.8431, 8.7392], and [1.8419, 8.6799], respectively, with corresponding classical MSEs of 0.0259, 0.0258, 0.0148, 0.0259, 0.0103, and 0.0097. It indicates that the proposed neutrosophic regression estimator ${\bar{y}}_{N R e g}$ exhibits the lowest MSEs in both neutrosophic and classical approaches, among all the existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}},$ and ${\bar{y}}_{R N_{E}}$ proposed by Tahir et al. [26]. It is important to note that the proposed neutrosophic regression estimator performs better than the neutrosophic exponential estimator, followed by others. As sample size n increased from 15 to 25, a consistent decrease in MSE was observed across all estimators in both neutrosophic and classical settings. With sample size n= 25, MSEs for estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}, {\bar{y}}_{R N_{E},}$ and ${\bar{y}}_{N R e g}$ were determined and found within the intervals [1.6790, 10.2019], [1.6009, 9.5987], [1.1472, 6.9218], [1.6634, 10.0972], [0.9974, 4.7295], and [0.9968, 4.6974], respectively, with corresponding classical MSEs of 0.0140, 0.0139, 0.0080, 0.0141, 0.0056, and 0.0052. This pattern of declining MSEs with increasing sample size confirms the improved estimation efficiency of the proposed neutrosophic regression estimator across both neutrosophic and classical statistical frameworks. It is observed that, in the case of real datasets, point estimates sometimes fluctuate from neutrosophic intervals due to factors like measurement errors, seasonal fluctuations, environmental variability, and so on. Hence, the neutrosophic framework seems more relevant than classical statistics in these types of situations. Therefore, it can be concluded that estimates under the neutrosophic framework provide a better understanding and more effective handling of indeterminacy.

Table 4

Table 4. Comparison of MSE for existing vs. proposed estimators using dataset 2.

For dataset 3, sample sizes n equal to 6, 9, and 12 were selected from a population of size N= 36, using simple random sampling without replacement (SRSWOR). These sample sizes represent approximately 15%, 25%, and 33% of the population size N= 36, respectively.

The findings of Table 5 revealed that the MSE of estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}, {\bar{y}}_{R N_{E,}}$ and ${\bar{y}}_{N R e g}$ for sample size n= 6, were estimated to lie within the intervalss [0.1689, 0.5957], [0.1772, 0.6263], [0.2592, 0.7915], [0.1721, 0.6094], [0.6187, 1.9677] and [0.1021, 0.3731], respectively, with corresponding classical MSEs of 0.2837, 0.2938, 0.3991, 0.2875, 0.7834, and 0.1101. A close examination exhibits that the MSE of the neutrosophic exponential estimator is much larger than that of all estimators in neutrosophic and classical approaches, indicating that the exponential estimator is not suitable in some real-life situations, whereas the proposed neutrosophic regression estimator ${\bar{y}}_{N R e g}$ outperforms in all types of situations across the range of Pearson's correlation, among all the existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3},}$ and ${\bar{y}}_{R N_{E}}$ proposed by Tahir et al. [26]. Increasing the sample size from 6 to 12 exhibited a consistently decreasing trend of MSEs in both the neutrosophic and classical frameworks. In this way, MSEs for estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}, {\bar{y}}_{R N_{E},}$ and ${\bar{y}}_{N R e g}$ with sample size n=12, were evaluated to lie within the intervals [0.0676, 0.2383], [0.0709, 0.2505], [0.1037, 0.3166], [0.0688, 0.2438], [0.2475, 0.7871] and [0.0408, 0.1492], respectively, with corresponding classical MSEs of 0.1134, 0.1175, 0.1596, 0.1150, 0.3133, and 0.0440. This consistent decrease in MSEs with increasing sample size confirms the efficiency of these estimators in both frameworks. The neutrosophic framework provides interval-based MSE estimates that capture a range of uncertainties, in contrast to the point estimates of the classical framework. In this case of a real dataset related to demography, the classical MSE falls within the neutrosophic MSE interval across all estimators. This suggests that the neutrosophic regression estimator effectively handles uncertainty with greater precision and robustness.

Table 5

Table 5. Comparison of MSE for existing vs. proposed estimators using dataset 3.

For dataset 4, sample sizes n equal to 2, 3, and 4 were drawn from a population of size N = 12, with simple random sampling without replacement (SRSWOR) sampling. These sample sizes represent approximately 16%, 25%, and 33% of the total population size N = 12, respectively.

On the perusal of Table 6, with sample size n = 2, the MSEs of existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}} {\bar{y}}_{R N_{E}},$ and ${\bar{y}}_{N R e g}$ were obtained and found within the intervals [318844.3, 811371.7], [314515.0, 807758.0], [299088.7, 793749.0], [315957.9, 809267.5], [341817.9, 944036.4], and [271088.9, 783025.3], respectively, with corresponding classical MSEs of 333795.2, 327713.7, 306574.6, 329745.4, 303501.0, and 254138.6. However, MSEs of the neutrosophic exponential ratio estimator are comparatively higher among all existing estimators in both neutrosophic and classical approaches, indicating that the exponential estimator is not suitable in all real-life scenarios. This analysis suggested that the proposed neutrosophic regression estimator performs much better than all existing estimators in all types of situations across the range of Pearson's correlation. As the sample size n increased from 2 to 4, the MSEs of all the existing estimators consistently declined. In this sequence, MSEs with sample size n = 4, for estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}} {\bar{y}}_{R N_{E}}$ , and ${\bar{y}}_{N R e g}$ were determined and found to be within the ranges [127537.7, 324548.7], [125806.0, 323103.2], [119635.5, 317499.6], [126383.2, 323707.0], [136727.2, 377625.3, and [108435.6, 313210.1], respectively, with corresponding classical MSEs of 133518.1, 131085.5, 122619.0, 131898.2, 121400.0, and 101655.0. These findings indicate that decreasing MSE with increasing sample size validates the efficiency of all the estimators. The neutrosophic framework delivers interval-based MSE estimates that effectively account for data uncertainty from various sources. Conversely, the classical framework provides only point estimates, failing to address data uncertainties, making it less suitable for the current data context. This highlights the ability of neutrosophic estimators to manage diverse forms of indeterminacy in real-world data. Thus, the proposed neutrosophic estimator offers enhanced reliability and robustness for handling indeterminate datasets.

Table 6

Table 6. Comparison of MSE for existing vs. proposed estimators using dataset 4.

Performance measure Percent Relative Efficiency (PRE) was also utilized to compare the performance of the proposed neutrosophic regression estimator with existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}$ , and ${\bar{y}}_{R N_{E}}$ and presented in Table 7.

Table 7

Table 7. Comparison of PRE for existing vs. proposed estimators.

On the assessment of Table 7, in the case of dataset 1, the PRE of the proposed neutrosophic regression estimator ${\bar{y}}_{N R e g}$ is [100.00, 100.00]. By employing this, PRE for other estimators was compared. PRE for the estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}$ , and ${\bar{y}}_{R N_{E}}$ were measured and found to be within the intervals [155.46, 185.13], [101.41, 100.38], [128.21, 107.56], [101.99, 110.76], and [100.18, 106.13], respectively, with corresponding classical PREs of 751.38, 146.15, 113.43, 262.27, and 189.61. These results demonstrated that the proposed neutrosophic regression estimator was more efficient than all existing estimators in both neutrosophic and classical frameworks. The estimators ${\bar{y}}_{R N}$ and ${\bar{y}}_{R N_{3}}$ contained much higher PREs in the classical framework than their neutrosophic counterpart. It indicates that the proposed neutrosophic regression estimator provides more stable and realistic results under uncertainty. Similarly, for dataset 2, PRE for the existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}},$ and ${\bar{y}}_{R N_{E}}$ were evaluated and found within the intervals [168.45, 217.18], [160.61 204.36], [115.09, 147.35], [166.88, 214.95] and [100.12, 100.68], respectively, with corresponding classical PREs were 267.24, 265.67, 152.82, 266.93, and 106.71. These findings suggest that the proposed neutrosophic regression estimator performed better than the existing estimators in both settings. Moreover, the inflated PRE values under the classical setup further emphasize the limitations of classical estimators in handling data uncertainty, thereby highlighting the broader applicability and effectiveness of the neutrosophic approach in real-world, indeterminate scenarios. For dataset 3, the proposed neutrosophic regression estimator ${\bar{y}}_{N R e g}$ compared with the existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3},}$ and ${\bar{y}}_{R N_{E}}$ were found within the intervals [165.43, 159.66], [173.56, 167.86], [253.87, 212.14], [168.56, 163.34] and [605.97,527.39], respectively, with corresponding classical PREs of 257.59, 266.82, 362.46, 261.15, and 711.49. It was found that the proposed neutrosophic regression estimator outperformed all the existing estimators in both neutrosophic and classical approaches. However, the performance of neutrosophic exponential estimators compared to other existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2},}$ and ${\bar{y}}_{R N_{3}}$ was the worst. It exhibits existing estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}$ and ${\bar{y}}_{R N_{3}}$ are several times more efficient than the neutrosophic exponential estimator even for highly correlated variables in some situations under both neutrosophic and classical frameworks. Additionally, the inflated PRE values observed under the classical framework, as compared to those under the neutrosophic framework, suggest that classical estimators may overestimate efficiency in uncertain data conditions. This further underscores the broader applicability and robustness of the neutrosophic estimators, particularly in real-world scenarios characterized by indeterminacy and imprecision. Similarly for dataset 4, PRE for estimators ${\bar{y}}_{R N}, {\bar{y}}_{R N_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}$ , and ${\bar{y}}_{R N_{E}}$ were obtained and found within the intervals [17.62, 103.62], [116.02,103.16], [110.32, 148.21], [116.55, 103.35], and [126.09, 120.56], respectively with corresponding classical PREs of 131.34, 128.95, 120.62, 129.75, and 119.42. Based on the performance, the proposed neutrosophic regression estimator was found to be the most efficient among all existing estimators. However, the PRE value of the neutrosophic exponential estimator indicates that all other estimators ${\bar{y}}_{R N}, {\bar{y}}_{{R N}_{1}}, {\bar{y}}_{R N_{2}}, {\bar{y}}_{R N_{3}}$ performed much better than the same in some real-world scenarios in both neutrosophic and classical settings, so it can be said that the neutrosophic exponential estimator fails to tackle some real-world problems even with the highly correlated data. Along with this, the proposed estimator has wide applicability to handle uncertainty for imprecise and indeterminate data.

4.2 Evaluation of the proposed neutrosophic regression estimator using simulated data

To validate the robustness of the proposed neutrosophic regression estimator, a simulation study was conducted by generating a large neutrosophic population under a neutrosophic normal distribution such that the neutrosophic study variable Y_N and auxiliary variable X_N follows a neutrosophic normal distribution (NND) with $Y_{N} ~ N N (μ_{y N}, σ_{y N}^{2})$ , Y_N ∈ (Y_L, Y_U), μ_yN ∈ (μ_yL, μ_yU), $σ_{y N}^{2} \in (σ_{y L :}^{2}, σ_{y U}^{2})$ , and $X_{N} ~ N N (μ_{x N}, σ_{x N}^{2})$ , X_N ∈ (X_L, X_U), μ_xN ∈ (μ_xL, μ_xU), $σ_{x N}^{2} \in (σ_{x L :}^{2}, σ_{x U}^{2})$ . Let Y_N~NN([76.0, 84.9], [(12.9)², (17.2)²]), where μ_yN ∈ (76.0, 84.9), σ_yN ∈ (12.9, 17.2) and X_N~NN ([171.2, 180.4], [(5.8)², (6.7)²]) where μ_xN ∈ (171.2, 180.4), σ_xN ∈ (5.8, 6.7). Simulations were conducted with population sizes of 2,000 and 10,000, running 10,000 iterations to generate results using R software.

Table 8 summarizes the descriptive statistics of the simulated neutrosophic data and is used to evaluate the performance of the proposed neutrosophic regression-type estimator under the simulation study. In this study, a neutrosophic normal distribution (NND) was assumed for the study variable Y_N and auxiliary variable X_N, with specified levels of variability and indeterminacy. The population size of 2,000 represented a large dataset generated for simulation, and the sample size was considered as 50 under SRSWOR. The neutrosophic means of the study variable and the auxiliary variable were determined to lie within the intervals [75.0504, 84.0164] and [169.8597, 181.2128], respectively. The neutrosophic coefficient of variation for the study variable and the auxiliary variable ranged from [0.1774, 0.2471] and [0.0718, 0.0777], respectively. The neutrosophic kurtosis was found to lie between [2.9096, 3.7944], and the neutrosophic correlation between the study variable and auxiliary variable was obtained to lie within the range [0.0381, 0.1615]. It indicates a weak correlation between the study variable and the auxiliary variable. Similarly, a sample size of 500 was considered from the simulated population of size 10,000 under SRSWOR, and the neutrosophic means of the study variable and the auxiliary variable were determined to lie within the intervals [75.6511, 85.0679] and [170.8981, 180.4626], respectively. The neutrosophic coefficient of variation for the study variable and the auxiliary variable were evaluated and ranged in intervals [0.18667, 0.25904] and [0.06233, 0.06744], respectively. The neutrosophic kurtosis was found to lie between [2.80087, 2.65350], and the neutrosophic correlation between the study variable and auxiliary variable was obtained to lie within the range [0.00721, −0.03439], where the upper bound of the neutrosophic correlation is negative.

Table 8

Table 8. Descriptive statistics for simulation under neutrosophic data.

Table 9 summarizes the results of the simulation study. The performance of the proposed neutrosophic regression estimator was assessed under simulated conditions by increasing the population size from 2000 to 10000, with corresponding sample sizes of 50 and 500, respectively. The findings reinforce the robustness and consistency of the proposed estimator across varying sample sizes and correlation levels. At a population size of 2,000 with a sample size of 50, the Mean Squared Error (MSE) for the proposed neutrosophic regression estimator was found to lie within the interval [3.4525, 8.1830], representing the lowest among all the evaluated estimators. This superior performance was observed even under weak neutrosophic correlation scenarios, where the correlation ranged between [0.0381, 0.1615]. With an increase in population size to 10,000 and a corresponding sample size of 500, a consistent decrease in MSE was observed across all estimators, confirming the expected enhancement in estimation accuracy with larger samples. The MSE for the proposed estimator further declined to [0.3789, 0.9215], maintaining the lowest range among all compared estimators. Notably, this performance was achieved under a negative neutrosophic correlation scenario [0.00721, −0.03439], highlighting that the proposed estimator remains highly efficient even in weak or adverse correlation conditions. These simulation outcomes validate the superiority of the proposed neutrosophic regression estimator over existing estimators. Not only does it perform well in small samples and under weak correlation, but it also scales efficiently to large populations and samples. The PRE values of all competing estimators exceed 100, which typically implies lower efficiency relative to the proposed estimator, whose PRE is standardized to 100. Thus, these results confirm the estimator's robustness, general applicability, and high accuracy in scenarios characterized by indeterminacy and imprecision.

Table 9

Table 9. Comparison of MSE and PRE for existing vs. proposed estimators under simulation.

It suggests that the proposed neutrosophic regression estimator outperforms all existing estimators. These results align with the study conducted on real-world datasets and further reinforce the robustness, consistency, and superiority of the proposed neutrosophic regression estimator in handling data that contains either high, weak, or moderate correlation as well as negative correlation structure.

5 Conclusion

The proposed neutrosophic regression estimator is an extension of the classical regression estimator. The proposed neutrosophic regression estimator assumes that the neutrosophic study and auxiliary variables follow a linear relationship within the interval-based framework. In this study, a neutrosophic regression estimator was developed for estimating the finite population mean in the presence of indeterminate, imprecise, and vague data. Under conditions I_N = 0, where indeterminacy is absent, the proposed neutrosophic estimator reduces to the classical regression estimator, ensuring compatibility with traditional methods. The estimator's bias and mean squared error (MSE) were derived using Taylor series expansion, with the MSE minimized through differentiation to achieve optimal performance. The performance of the proposed estimator was evaluated using a simulated dataset and four real-world datasets related to blood pressure, temperature, natural growth rate, and solar energy in both the neutrosophic and classical frameworks. Beyond the evaluated datasets, the proposed estimator has potential applications in various domains. In agriculture, it can improve crop yield forecasting by accounting for uncertainties in weather and soil conditions. In public health, it can enhance the analysis of medical measurements in the case of cholesterol levels and heart rate variability with inherent variability. In the subject of social sciences, employment rate, economic growth rate, and so on. The “neutroSurvey” R-package further enables its use in large-scale surveys, such as national census data analysis or economic forecasting, where data indeterminacy is common. The correlation between the study variable and auxiliary variable was [0.0381, 0.1615] and [0.00721, −0.03439] for population sizes of 2,000 and 10,000 with sample sizes of 50 and 500, respectively, in the simulated data. In real datasets, correlation varied across different scenarios: [0.3932, 0.5503], [0.6126, 0.6759], [0.9652, 0.9585], and [0.8216, 0.7391]. These values represent weak to strong correlation levels as well as negative correlation. The proposed neutrosophic regression estimator consistently outperforms all existing neutrosophic ratio and neutrosophic exponential estimators across all the ranges of correlation levels, from weak to strong positive and negative. As this study focuses mainly on proving that our proposed estimator based on neutrosophic data is performing better than the existing ones, a comparison with a non-neutrosophic scenario was not performed in this study. It was found that the proposed neutrosophic regression estimator can enhance decision-making in fields where data are often imprecise or incomplete, such as agriculture, health, and environmental studies.

6 Limitations and future studies

The proposed methodology is only for uni-stage sampling designs, whereas in real-life large-scale surveys, complex sampling designs with complex probability structures, i.e., multistage sampling, PPS sampling, multiphase sampling, and so on, were used. Furthermore, model-based and model-assisted estimation is preferred in many survey setups for the generation of the official statistics. The classical regression estimator is a special case of the assisted generalized regression estimator (GREG). Hence, for future research, there is a scope for producing neutrosophic estimators for more advanced complex sampling designs as well as estimation procedures. Furthermore, its extension to multivariate setups, or integration with robust and machine learning-based methods, can also be explored.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://ncdrisc.org/index, https://data.gov.in/resource/seasonal-and-annual-minimum-maximum-temperature-series-1901-2017, https://censusindia.gov.in/nada/index.php/catalog/42687.

Author contributions

NP: Data curation, Visualization, Methodology, Project administration, Validation, Conceptualization, Writing – original draft, Software, Formal analysis, Investigation, Writing – review & editing, Resources. KA: Visualization, Resources, Project administration, Data curation, Formal analysis, Validation, Methodology, Software, Writing – review & editing, Investigation, Writing – original draft, Conceptualization, Supervision. Bharti: Data curation, Visualization, Project administration, Validation, Supervision, Writing – original draft, Investigation, Formal analysis, Writing – review & editing. PD: Data curation, Validation, Project administration, Writing – review & editing, Supervision, Formal analysis, Investigation, Writing – original draft, Software. TA: Data curation, Visualization, Writing – original draft, Project administration, Resources, Formal analysis, Writing – review & editing, Supervision.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fams.2025.1658157/full#supplementary-material

References

1. Office Office of the Registrar General & Census Commissioner, India (ORGI). Sample Registration System (SRS)-Bulletin 2020, Vol. 55-1 (2022). Available online at: https://censusindia.gov.in/nada/index.php/catalog/42687 (Accessed August 27, 2025).

Google Scholar

2. Mallik S, Mohanty S, Shankar B, Mishra P. Recommendation system using neutrosophic logic in agriculture. Int J Intell Syst Appl Agric. (2024) 12:735–41. doi: 10.1289/isesisee.2018.P01.1470

Crossref Full Text | Google Scholar

3. Smarandache F. Neutrosophy: neutrosophic probability, set, and logic: analytic synthesis & synthetic analysis. Rehoboth: Amsterdam Research Press. (1998) 105:118–23.

Google Scholar

4. Smarandache F. Introduction to Neutrosophic Statistics. Craiova: Sitech & Education Publishing (2014).

Google Scholar

5. Alhabib R, Ranna MM, Haitham F, Salama AA. Some neutrosophic probability distributions. Neutrosophic Sets Syst. (2018) 22:30–8. doi: 10.5281/zenodo.2160478

Crossref Full Text | Google Scholar

6. Aslam M. Neutrosophic analysis of variance: application to university students. Complex Intell Syst. (2019) 5:403–7. doi: 10.1007/s40747-019-0107-2

Crossref Full Text | Google Scholar

7. Aslam M. Monitoring the road traffic crashes using NEWMA chart and repetitive sampling. Int J Inj Contr Saf Promot. (2020) 28:39–45. doi: 10.1080/17457300.2020.1835990

PubMed Abstract | Crossref Full Text | Google Scholar

8. Aslam M. A study on skewness and kurtosis estimators of wind speed distribution under indeterminacy. Theor Appl Climatol. (2021) 143:1227–34. doi: 10.1007/s00704-020-03509-5

Crossref Full Text | Google Scholar

9. Aslam M. Analyzing Gray cast iron data using a new Shapiro-Wilks test for normality under indeterminacy. Int J Cast Met Res. (2021) 34:1–5. doi: 10.1080/13640461.2020.1846959

Crossref Full Text | Google Scholar

10. Aslam M. Testing average wind speed using sampling plan for Weibull distribution under indeterminacy. Sci Rep. (2021) 11:1–9. doi: 10.1038/s41598-021-87136-8

PubMed Abstract | Crossref Full Text | Google Scholar

11. Woodall WH, Driscoll DC, Montgomery DC. A review and perspective on neutrosophic statistical process monitoring methods. IEEE Access (2022) 10:100456–62. doi: 10.1109/ACCESS.2022.3207188

Crossref Full Text | Google Scholar

12. Aslam M. Radar data analysis in the presence of uncertainty. Eur J Remote Sens. (2021) 54:140–4. doi: 10.1080/22797254.2021.1886597

PubMed Abstract | Crossref Full Text | Google Scholar

13. Aslam M. Chi-square test under indeterminacy: an application using pulse count data. BMC Med Res Methodol. (2021) 21:201. doi: 10.1186/s12874-021-01400-z

PubMed Abstract | Crossref Full Text | Google Scholar

14. Aslam M. On testing autocorrelation in metrology data under indeterminacy. Mapan. (2021) 36:515–9. doi: 10.1007/s12647-021-00429-1

Crossref Full Text | Google Scholar

15. Aslam M. Neutrosophic statistical test for counts in climatology. Sci Rep. (2021) 11:1–5. doi: 10.1038/s41598-021-97344-x

PubMed Abstract | Crossref Full Text | Google Scholar

16. Aslam M, Shafqat A, Albassam M, Malela-Majika JC, Shongwe SC. A new CUSUM control chart under uncertainty with applications in petroleum and meteorology. PLoS ONE. (2021) 16:e0246185. doi: 10.1371/journal.pone.0246185

PubMed Abstract | Crossref Full Text | Google Scholar

17. Aslam M, Sherwani RAK, Saleem M. Vague data analysis using neutrosophic Jarque-Bera test. PLoS ONE. (2021) 16:e0260689. doi: 10.1371/journal.pone.0260689

PubMed Abstract | Crossref Full Text | Google Scholar

18. Aslam M, Khan N. Normality test of temperature in Jeddah city using Cochran's test under indeterminacy. Mapan. (2021) 36:589–98. doi: 10.1007/s12647-020-00428-8

Crossref Full Text | Google Scholar

19. Aslam M, Algarni A. Analysing the solar energy data using a new Anderson-Darling test under indeterminacy. Int J Photoenergy. (2020) 2020:6662389. doi: 10.1155/2020/6662389

Crossref Full Text | Google Scholar

20. Sisodia B, Dwivedi V. Modified ratio estimator using coefficient of variation of auxiliary variable. J Indian Soc Agric Stat. (1981) 33:13–8.

Google Scholar

21. Upadhyaya LN, Singh HP. On the estimation of the population mean with known coefficient of variation. Biom J. (1984) 26:915–22. doi: 10.1002/bimj.4710260814

Crossref Full Text | Google Scholar

22. Singh HP, Tailor R, Kakran M. An improved estimator of population mean using power transformation. J Indian Soc Agric Stat. (2004) 58:223–30.

Google Scholar

23. Kadilar C, Cingi H. Ratio estimators for the population variance in simple and stratified random sampling. Appl Math Comput. (2006) 173:1047–59. doi: 10.1016/j.amc.2005.04.032

Crossref Full Text | Google Scholar

24. Bahl S, Tuteja RK. Ratio and product type exponential estimators. J Inf Optim Sci. (1991) 12:159–64. doi: 10.1080/02522667.1991.10699058

Crossref Full Text | Google Scholar

25. Gupta S, Shabbir J, Sousa R, Corte Real P. Ratio estimation of the mean of a sensitive variable in the presence of auxiliary information. Commun Stat Theory Methods. (2012) 41:2394–404. doi: 10.1080/03610926.2011.641654

PubMed Abstract | Crossref Full Text | Google Scholar

26. Tahir Z, Khan H, Alamri FS, Aslam M, Aljohani HM. Neutrosophic ratio-type estimators for estimating the population mean. Complex Intell Syst. (2021) 7:2991–3001. doi: 10.1007/s40747-021-00439-1

PubMed Abstract | Crossref Full Text | Google Scholar

27. Tahir Z, Khan H, Alamri FS, Aslam M. Neutrosophic ratio-type exponential estimators for estimation of population mean. J Intell Fuzzy Syst. (2023) 45:4559–83. doi: 10.3233/JIFS-223539

Crossref Full Text | Google Scholar

28. Alqudah MA, Zayed M, Subzar M, Wan SA. Neutrosophic robust ratio-type estimator for estimating finite population mean. Heliyon. (2024) 10:e28934. doi: 10.1016/j.heliyon.2024.e28934

PubMed Abstract | Crossref Full Text | Google Scholar

29. Singh R, Kumari A, Smarandache F, Tiwari SN. Construction of almost unbiased estimator for population mean using neutrosophic information. Neutrosophic Sets Syst. (2024) 76:449–63. doi: 10.5281/zenodo.14010268

Crossref Full Text | Google Scholar

30. Yadav VK, Majhi D, Alkhathami AA, Prasad S. Neutrosophic mean estimators using extreme indeterminate observations in sample surveys. Neutrosophic Sets Syst. (2025) 80:1. doi: 10.5281/zenodo.14707260

Crossref Full Text | Google Scholar

31. Package T, Neutrosophic T, Data S. Package ‘neutroSurvey' (2025). Available online at: https://CRAN.R-project.org/package=neutroSurvey (Accessed August 6, 2025).

Google Scholar

Keywords: classical statistics, neutrosophic statistics, neutrosophic estimator, bias, percent relative efficiency (PRE)

Citation: Purwar N, Aditya K, Bharti, Das P and Ahmad T (2025) Neutrosophic regression type estimator for the finite population mean and its applications in real data scenarios. Front. Appl. Math. Stat. 11:1658157. doi: 10.3389/fams.2025.1658157

Received: 02 July 2025; Accepted: 18 August 2025;
Published: 17 September 2025.

Edited by:

Surapati Pramanik, Nandalal Ghosh B.T. College, India

Reviewed by:

Zakariya Yahya Algamal, University of Mosul, Iraq
Muhammad Aslam, King Abdulaziz University, Saudi Arabia

Copyright © 2025 Purwar, Aditya, Bharti, Das and Ahmad. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kaustav Aditya, a2F0dTQ0OTNAZ21haWwuY29t

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.