Edited by: Beate Pinior, University of Veterinary Medicine Vienna, Austria
Reviewed by: Brecht Devleesschauwer, Sciensano, Belgium; Chiara Antonini, ICT4life, Italy
This article was submitted to Veterinary Epidemiology and Economics, a section of the journal Frontiers in Veterinary Science
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Latent class analysis is a well-established method in human and veterinary medicine for evaluating the accuracy of diagnostic tests without a gold standard. An important assumption of this procedure is the conditional independence of the tests. If tests with the same biological principle are used, this assumption is no longer met. Therefore, the model has to be adapted so that the dependencies between the tests can be considered. Our approach extends the traditional latent class model with a term for the conditional dependency of the tests. This extension increases the number of parameters to be estimated and leads to negative degrees of freedom of the model, meaning that not enough information is contained in the existing data to obtain a unique estimate. As a result, there is no clear solution. Hence, an iterative algorithm was developed to keep the number of parameters to be estimated small. Given adequate starting values, our approach first estimates the conditional dependencies and then regards the resulting values as fixed to recalculate the test accuracies and the prevalence with the same method used for independent tests. Subsequently, the new values of the test accuracy and prevalence are used to recalculate the terms for the conditional dependencies. These two steps are repeated until the model converges. We simulated five application scenarios based on diagnostic tests used in veterinary medicine. The results suggest that our method and the Bayesian approach produce similar precise results. However, while the presented approach is able to calculate more accurate results than the Bayesian approach if the test accuracies are initially misjudged, the estimates of the Bayesian method are more precise when incorrect dependencies are assumed. This finding shows that our approach is a useful addition to the existing Bayesian methods, while it has the advantage of allowing simpler and more objective estimations.
Information about the occurrence of livestock diseases in an animal population is important in many applications, such as surveillance and vaccination programs or the verification of the freedom from the disease. Therefore, the disease status of the individual animals is assessed using a diagnostic test. However, every test also has a number of incorrect results, which, depending on the disease, may have serious economic, social or political consequences. This result can be avoided by sequentially examining a subset of the animals with a different test or by testing all the animals with multiple diagnostic tests from the outset and thus confirming the diagnosis (
The latter was the chosen approach in a field trial to determine the prevalence of
However, the test accuracy depends on many biological factors, such as the animal species, race, sex and immune history. For this reason, the diagnostic test accuracy varies across populations, and the values obtained in clinical evaluation studies are only conditionally applicable to the field settings (
In this context, latent class analysis (LCA) is based on the assumption that observed categorical indicators imperfectly measure an underlying latent structure. By sampling the values of the categorical variable for a set of observations, the method is able to discover the latent structure and the error in the indicators. Applying this principle to the field of diagnostic test evaluation, the true unknown disease status is measured by observed diagnostic tests. By analyzing the response pattern of a set of tests, the prevalence of the disease in the sample and the diagnostic accuracy of every test used in this model can be discovered. Hence, the specific test performance under the given conditions such as the study settings and the structure of the subpopulation can be estimated.
The Bayesian approach to latent class models of the test accuracy is widely used in veterinary medicine (
Some Bayesian methods allow the consideration of conditionally dependent tests. A fixed effects model and a random effects model for data from a single population were developed in one approach (
There are also some frequentist approaches for incorporating a dependence structure into the latent class analysis. A latent marginal model (
Although the solutions addressed above are available, we propose a frequentist method for estimating the prevalence, diagnostic test accuracy and dependence structure because we would like to present an easy-to-apply approach even for situations with no accessible prior information. The solution was intended to fit even when only three diagnostic tests are available and the status of each individual is unknown. We present the model as well as the algorithm and discuss its performance in different simulated scenarios, which were adopted from real-world examples in veterinary medicine to examine the performance of our method under different circumstances. The non-mathematically inclined reader may skip the following three subsections describing the statistical model.
In a latent class model, it is assumed that there is a latent variable with
Key assumption of the approach is the conditional independence (i.e., given the true disease status) of all the tests (
If diagnostic tests are independent, the conditional response probabilities result from the product of the tests' individual response probabilities as described in the likelihood (b). It can be written in the simplified term (c) for three diagnostic tests
The pairwise dependencies
Some authors use the terms “dependency” and “correlation” interchangeably (
The latent disease status determines the correct diagnosis of an observation, while other external factors trigger a misdiagnosis. Thus, only matching incorrect results are of interest to assess the dependency of the tests. The proportion of incorrect results, i.e., the proportion of incorrectly diagnosed animals, is determined by the accuracy of the test, which causes specific restrictions for the dependency parameter settings (
For the dependencies to be comparable, they have to be detached from the test accuracies. In case of pairwise dependencies, this is achieved by standardizing Formula (d) to Formula (f). The three-test dependencies are standardized analogously.
The dependencies of the diagnostic tests are calculated within both latent classes and remain constant for all possible combinations of the results of the three tests in the respective class. Therefore, these values can also be determined by using all observed response patterns. Only the signs of the dependencies have to be adjusted due to matching or differing test results. Changing these equations results in the functions (g) of the conditional items response probabilities for the class of non-infected animals
The (standardized) dependency (e) indicates the strength of the interdependence of the tests, i.e., the share of their concordant false diagnoses. To interpret this measure, the size and direction should be considered. If two tests completely agree on their incorrect diagnosis (i.e., both tests assign the incorrect disease status to exactly the same animals), then they have a (standardized) dependency of 1. If the two tests agree only at random regarding the incorrect diagnosis of the disease status, the observed agreement between these tests matches the expected agreement between two independent tests. Hence, the (standardized) dependency is zero. The dependency is negative when there are fewer matching results than expected by chance. Thus, this measure has an interpretation similar to Cohen's Kappa. However, here, negative values play a subordinate role in the application to diagnostic tests, since similar testing principles tend to lead to increased agreement in incorrect decisions. It is very unlikely that two tests with a similar test procedure have a negative dependency, as this phenomenon would imply that higher biological similarity leads to a lower level of agreement.
To obtain a better idea of the magnitude of the conditional dependency, published studies that contain observations with a confirmed latent status may be discussed. Although publications with the information needed are rare and provide a rather rough indication on the size of the dependencies realized under the given study conditions, they may yield a valuable starting value for subsequent analyses. The following examples set the framework for the magnitude of the dependency in our simulation.
First, we calculated the standardized dependency from study data on toxoplasmosis in pigs (
In the last subsections it was described how the classical frequentist latent class analysis can be extended by a term to describe the conditional dependencies between the diagnostic tests. Due to the general misspecification of the describing parameters within the setting of conditional dependency, an iterative algorithm is proposed. Here, we present a solution for the use of three diagnostic tests. The basic idea of the algorithm is to consider alternately the test accuracies and the conditional dependencies as fixed values. Thus, the method presented here has the advantage of always resulting in a positive number of degrees of freedom in each iteration step for three tests in one population compared to other methods for the estimation of test accuracies for conditionally dependent tests [e.g., (
Choose suitable starting values for the test accuracies and the conditional dependencies between the tests.
Consider the conditional dependencies as fixed. Execute the expectation maximization (EM) algorithm to estimate the best-fit test accuracies for the data. For this step, we followed the EM algorithm described in a conditional independent latent class approach (
Recalculate the conditional dependency in two substeps:
Use the conditional dependencies and the test accuracies from the previous step to calculate the latent class membership probabilities With the knowledge of the latent status, determine the conditional dependency by using Formulas (d, e).
Start again with step (ii) until the model converges, i.e., the log-likelihood of two consecutive models differs by <0.00001 or the algorithm reaches 1,000 iterations.
We implemented the algorithm in R [version 3.5.0; (
We tested the applicability of the algorithm for three diagnostic tests in a single population by conducting a simulation study. Therefore, we took different combinations of the test accuracies, prevalence and conditional dependencies into account. These scenarios allowed an assessment of the performance of the iterative approach presented in this publication compared with that of the conditionally independent latent class analysis and the Bayesian approach for conditionally dependent tests. All the simulation scenarios are motivated by diagnostic tests used in veterinary medicine. As a small sample size leads to an increased margin of error, we simulated 10,000 observations.
We considered the following cases:
(1) Three independent diagnostic tests with high test accuracies in a population with a moderate prevalence:
Diagnostic tests are conditionally independent if they are based on different biological principles, for example, if a tumor is detected using physical examination, medical imaging (e.g., sonography) and microscopic examination of a tissue sample. Scenarios such as this one should not cause any problems in the conditionally independent latent class analysis and result in very accurate estimates of that method. Therefore, it should provide the same results in the new approach presented in this publication as well as in models assuming conditional independent diagnostic tests. Hence, this scenario serves as basic validation for the newly fitted model.
(2) Two highly dependent tests with low test accuracies and a third test with low dependencies and high test accuracies in a population with a high prevalence:
This scenario may be the most problematic in the conditionally independent latent class model: The two dependent tests may cause many matching results that lead to an overestimation of their test accuracies and underestimated values in the third test. This situation applies, for instance, to the diagnosis of infectious diseases with one antibody test applied to two different sample types (e.g., serum, feces, milk, etc.) and the often more accurate direct detection of the pathogen (
(3) Two highly dependent tests with low test accuracies and a third test with low dependencies and high test accuracies in a population with a low prevalence:
This is generally the same scenario as (2) but with a lower prevalence, which is common for many diseases. In this case, only a small proportion of the sample contains positive responses compared to the other possible response patterns. This phenomenon makes the estimation of both the prevalence and sensitivities more ambiguous and therefore more prone to errors.
(4) Three diagnostic tests with moderate test accuracies and medium dependencies in a population with a high prevalence:
In this scenario, all three tests are conditionally dependent on each other, resulting in an overestimation of their test accuracies in the latent class model, which assumes conditional independence. For instance, three different veterinarians may perform a physical examination on the same group of animals with a suspected disease. They all have different qualifications and therefore different diagnostic sensitivities and specificities. However, experience in the same work environment with the same time and budgetary constraints can be the cause of consistent misdiagnoses (
(5) Three diagnostic tests and a population with values for the test accuracies and the prevalence from a practical example with estimated values for the dependency structure:
To ensure that the procedure is evaluated under realistic conditions, this scenario uses the results from a prevalence study for
Input parameter values for the data simulation of the five scenarios.
Prevalence in % | 30.00 | 40.00 | 3.00 | 40.00 | 20.00 |
Sensitivity Test 1 | 90.00 | 90.00 | 90.00 | 80.00 | 72.00 |
Sensitivity Test 2 | 85.00 | 70.00 | 70.00 | 66.00 | 65.00 |
Sensitivity Test 3 | 90.00 | 65.00 | 65.00 | 70.00 | 97.00 |
Specifity Test 1 | 95.00 | 99.00 | 99.00 | 95.00 | 98.00 |
Specifity Test 2 | 95.00 | 80.00 | 80.00 | 85.00 | 99.00 |
Specifity Test 3 | 99.00 | 85.00 | 85.00 | 88.00 | 98.00 |
0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) | 0.038 (0.200) | 0.129 (0.600) | |
0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) | 0.046 (0.250) | 0.008 (0.100) | |
0.000 (0.000) | 0.121 (0.600) | 0.121 (0.600) | 0.087 (0.400) | 0.012 (0.150) | |
0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) | −0.004 (−0.050) | 0.000 (0.000) | |
0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) | 0.016 (0.200) | 0.001 (0.100) | |
0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) | 0.018 (0.250) | 0.003 (0.150) | |
0.000 (0.000) | 0.086 (0.600) | 0.086 (0.600) | 0.046 (0.400) | 0.001 (0.100) | |
0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) | −0.001 (−0.050) | 0.000 (0.000) |
We applied nine different sets of starting values (6 well-chosen and 3 poorly chosen) to each of the five scenarios. As it can be assumed that prior knowledge of the applied tests, their dependency structure and the studied population is available, we considered six different sets of well-chosen “informative” starting values as follows (see The correct values for the test accuracies, the dependency structure and the prevalence. The correct values for the test accuracies and the prevalence; the dependency of the tests is stronger than that simulated. The correct values for the test accuracies and the prevalence; the dependency of the tests is weaker than that simulated. The only exception is scenario 1 (independent tests): As weakening the dependency of independent tests leads to negative dependencies and negative dependencies are not biologically justifiable, another set of positive dependencies is used instead. The correct values for the dependency structure; the test accuracies are better than those simulated, and the prevalence differs from the simulated value. The correct values for the dependency structure; the test accuracies are poorer than those simulated, and the prevalence differs from the simulated value. The values for the test accuracies, the dependency structure and the prevalence all differ (slightly) from the simulated data.
On the other hand, if a new diagnostic test is used, false assumptions about the underlying dependency structure, the test accuracy and even the prevalence are possible. Therefore, we also took three sets of poorly chosen starting values into consideration, that deviate greatly from the simulated values in terms of the test accuracy and the dependency structure (see A value of 50% for all the test accuracies and the prevalence; the tests are assumed to be independent of each other. Incorrect assumptions about which tests are dependent on each other, an incorrect ratio of the test accuracies and a prevalence that differs from the simulated value. The results from the conditionally independent latent class analysis for the test accuracies and the prevalence as well as the incorrect dependencies.
In some cases, there are justifiable restrictions for the resulting parameter values. As an example, negative dependencies between two diagnostic tests with the same biological testing principle are very unlikely. Another example for a justifiable restriction is to set the dependencies of known independent tests fixed to zero. Since the test accuracies are already limited to the unit interval by the EM algorithm, further restrictions always depend on the situation and are therefore difficult to determine.
We examined the effect of parameter restrictions on the estimations of the iterative approach by repeating the calculations and adding restriction rules. As all limitations of the resulting parameter values require knowledge of the population, the disease and the diagnostic tests used in the study, they depend on the setting and are not generalizable. Thus, we focused on the most basic limitations and excluded unrealistic dependencies [standardized values < −1 or >1; (
We also calculated the results for the five scenarios (
The five simulation scenarios consider different possible applications. Hence, we reflect on their results individually before we analyze them jointly to investigate possible differences. All results are shown in detail in
All three latent class analysis approaches were able to estimate the parameters precisely when independence was initially assumed (see
Nevertheless, overall, the results showed that all three approaches are equally applicable for evaluating independent diagnostic tests.
The conditionally independent latent class analysis was not able to detect the connection between the tests (for none of the applied starting values) and therefore misjudged their accuracy by up to 20%. In contrast, the iterative method was able to determine the simulated parameters with only minor deviations of at most 8% for all well-chosen starting values (see
The low prevalence in this scenario complicated the estimation. While the conditionally independent latent class analysis resulted in strongly deviating and unrealistic values for the outcome (e.g., a sensitivity of 13% for test 1), the iterative approach was mostly able to calculate the simulated values by using well-fitting starting values. Only the results for the sensitivity of test 1 posed a problem in two cases with deviations of more than 50% (see
This scenario resulted in the largest differences from the simulated values in the iterative approach (see
Differences between the configured input data and model outcomes using six sets of starting values for the latent class model assuming conditional independence and the iterative model under the conditions of scenario 3 [CLCA, conditionally independent LCA; SV1, set of starting values 1 (Sv2-Sv6 are defined analogously); Pr, Prevalence; Se1, Sensitivity of test 1; Sp1, Specificity of test 1 (Se2, Se3, Sp2, and Sp3 are defined analogously)].
The Bayesian approach led to better results than the frequentist method, as the deviations in the values for the sensitivity of test 1 reached a maximum of 11% for the well-chosen starting values (see
The estimates of the conditionally independent latent class analysis deviated by up to 16% from the simulated values while the iterative method and the Bayesian approach were able to obtain results that were more precise (maximum deviations of 6 or 12%; see
However, there was one exception in the Bayesian approach. Starting value set 5 (underestimated prevalence and test accuracy) led to values that deviated up to 20% from the simulated values. The initial misjudgment of the test accuracies and the prevalence therefore has a stronger effect on the Bayesian approach in this scenario than on the frequentist method.
The results of the poorly chosen starting values differed considerably from the simulated values in both methods (see
While the conditionally independent latent class analysis overestimated the values by almost 20% in this simulation, all six sets of well-chosen starting parameters resulted in approximately correct values for the iterative approach. The deviations in the estimated sensitivities reached up to 8%, whereas they were at most 2% for the specificities and the prevalence (see
The Bayesian method resulted in similar values for most of the well-chosen starting values with deviations up to 6% for the sensitivities and 5% for the specificities and the prevalence (see
Both the iterative, frequentist approach and the Bayesian approach resulted in strongly deviating parameter values with the poorly chosen starting values (see
The limitation to reasonable positive dependencies had no effect on the results for most starting values since the resulting dependencies were already within the defined boundaries. For starting value set 5, however, the differences between the input data and the outcome decreased remarkably (see
Differences between the configured input data and model outcomes using the six sets of starting values for the latent class model assuming conditional independence and by using the iterative model under the conditions of scenario 3 with only positive pairwise dependencies allowed [CLCA, conditionally independent LCA; SV1, starting values 1 (Sv2-Sv6 are defined analogously); Pr, Prevalence; Se1, Sensitivity of test 1; Sp1, Specificity of test 1 (Se2, Se3, Sp2, and Sp3 are defined analogously)].
In contrast, the parameter restrictions had very little effect in the Bayesian model (see
The latent class analysis considering the dependency structure was able to calculate less biased parameter values than the classical frequentist latent class analysis for most of the informative stating values. Both the Bayesian and iterative methods produced very similar results (see
Maximum deviations of the three compared methods to the simulated values in the five simulated scenarios with well-chosen starting values displayed as values in percent.
Scenario 1 | 1.0 | 6.9 | 4.4 |
Scenario 2 | 20.4 | 6.9 | 7.7 |
Scenario 3 | 77.0 | 15.0 | 86.4 |
Scenario 4 | 15.7 | 18.7 |
5.9 |
Scenario 5 | 19.6 | 5.9 |
8.1 |
Parameter restriction | 77.0 | 15.1 | 57.5 |
Although the iterative approach yielded varying results for the different starting value sets, the associated log-likelihood always had the same value within all five simulation scenarios. This finding indicates that all results within a scenario, although they have very different values, represent a local maximum and are therefore equally likely under the observed responses. These results were estimated in only a few iteration steps in all five scenarios (a maximum of 9 iteration steps in scenario 3, see
In this publication, we presented an iterative, frequentist latent class approach for the evaluation of conditionally dependent diagnostic tests. We compared it to the Bayesian method and the classical conditionally independent analysis by performing a simulation study.
If two diagnostic tests with the same biological principle are used, the same reasons (e.g., cross-reactions, pathogen concentration) will lead to incorrect diagnoses, which strongly connect the outcome of these tests (
These differences decrease or are even eliminated by using a model that considers the conditional dependencies. However, the accuracy of the estimates from the presented iterative method as well as the Bayesian approach strongly depends on two factors: the starting values and the size of the underlying parameters.
Starting values for the test accuracy of the diagnostic tests used can be obtained from the manufacturer's evaluation studies or from previous studies employing these tests. The conditional dependencies can be estimated by examining the biological methods of the tests and comparing them to each other. Similar procedures are more likely to be highly dependent (
The size of the underlying parameters also influences the quality of the estimates. For scenario 3 with a low prevalence and a strong dependency, all the compared methods attained suboptimal performance within the simulation study. This deteriorated accuracy in populations with low prevalence was also observed in the Bayesian framework in simulation studies (
Despite this phenomenon, the log-likelihoods of the different results within each scenario, the correct and the incorrect ones, converge to the same value. This finding suggests that the function has several maxima. Each result found by our method represents one local maximum and all these maxima are equally probable with the given dataset. Therefore, this model is not able to find a unique solution and well-chosen starting values are needed to ensure convergence to the correct parameter values. The reason is that the addition of dependency terms increases the number of parameters to be estimated, while the information provided by the observed response pattern does not change. The proposed method takes this property into account by estimating parameters in a stepwise algorithm that regards the dependency terms and the test accuracies alternately as fixed. As a result, there is a positive number of degrees of freedom in each step of the algorithm, and the identifiability is improved. The model is applicable to situations in which results from at least two diagnostic tests are available. However, as the two-test case was already underidentified without the additional dependency terms (df = −1) and therefore had no chance for a unique solution in the proposed iterative approach, we focused our analyses on the special case of three diagnostic tests. However, the increased number of tests is not sufficient for a clear result.
The ambiguity of the solutions occurs regardless of the chosen method, as the addition of dependency terms results in more parameters to be estimated than information is available in the data. Therefore, there is no model that takes the dependency between all tests into account and comes to a unique solution. This lack of knowledge has to be replaced by accurate prior information. If the priors are (unknowingly) false, the model is not always able to find the right solution for some parameter compositions. This limitation causes uncertainty regarding the assumptions at the beginning of the analysis; i.e., the results are not reliable, and not even an iterative calculation is able to solve this problem. Thus, good prior knowledge is necessary for accurate estimates, and uninformative priors should not be used with this method (the Bayesian or the iterative, frequentist approach). Other researchers have already observed and pointed out the importance of justified priors in the Bayesian framework (
Establishing boundaries for the dependencies improves the parameter estimates from the iterative approach further if the dependencies are within a certain interval and the values outside the interval can be excluded with certainty. These boundaries can prevent major deviations in the results, as shown in the second calculation of scenario 3. However, parameter restrictions should be used with caution. Only unrealistic values for a certain application should actually be excluded. If that is not the case, a true underlying parameter value may unknowingly be rejected as a possible solution, and the algorithm is no longer able to calculate the correct parameter set. Hence, the restrictions help to improve parameter estimation but also bear the risk of excluding the correct results from the start by choosing incorrect limits. Thus, if one is unsure of which limits to choose, it is better to completely remove them and carry out the estimation only with the best possible starting values.
Overall, the fit of the latent class model and the parameter estimates can be improved by allowing an interaction term. If the results of three diagnostic tests are available, both the Bayesian method and the iterative, frequentist approach presented in this paper are strongly dependent on the prior information due to the lack of information in the data. If there is insufficient knowledge about the test accuracies, the prevalence and the dependencies of the tests, and hence, these values are initially misjudged, both methods will lead to incorrect results. Extensive prior knowledge is therefore the basis for the applicability of the latent class analysis considering of conditional dependencies, both in the Bayesian and frequentist frameworks.
The presented simulation study showed that considering a possible dependency structure improves the estimation in a latent class analysis. However, it was unable to clearly determine which method resulted in more accurate values overall, as the iterative, frequentist approach and the Bayesian approach performed differently in the presented scenarios. While both methods are dependent on prior knowledge in the form of well-chosen starting values and prior distributions, the simulation studies carried out in this publication suggest that the iterative, frequentist method requires previous knowledge that is oriented more toward practical experience and therefore may be easier to obtain.
Overall, the simulation studies presented here indicate that the iterative, frequentist approach is an appropriate method to evaluate conditionally dependent diagnostic tests.
The original contributions presented in the study are included in the article/
CS and AC: conceptualization. CS: data curation, formal analysis, investigation, methodology, software, validation, and writing—original draft. LK and AC: project administration and supervision. CS, LK, and AC: writing—review and editing. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: