Associating Cognition With Amyloid Status Using Partially Ordered Set Analysis

Background: The presence of brain amyloid-beta positivity is associated with cognitive impairment and dementia, but whether there are specific aspects of cognition that are most linked to amyloid-beta is unclear. Analysis of neuropsychological test data presents challenges since a single test often requires drawing upon multiple cognitive functions to perform well. It can thus be imprecise to link performance on a given test to a specific cognitive function. Our objective was to provide insight into how cognitive functions are associated with brain amyloid-beta positivity among samples consisting of cognitively normal and mild cognitively impaired (MCI) subjects, by using partially ordered set models (POSETs). Methods: We used POSET classification models of neuropsychological test data to classify samples to detailed cognitive profiles using ADNI2 and AIBL data. We considered 3 gradations of episodic memory, cognitive flexibility, verbal fluency, attention and perceptual motor speed, and performed group comparisons of cognitive functioning stratified by amyloid positivity (yes/no) and age (<70, 70–80, 81–90 years). We also employed random forest methods stratified by age to assess the effectiveness of cognitive testing in predicting amyloid positivity, in addition to demographic variables, and APOE4 allele count. Results: In ADNI2, differences in episodic memory and attention by amyloid were found for <70, and 70–80 years groups. In AIBL, episodic memory differences were found in the 70–80 years age group. In both studies, no cognitive differences were found in the 81–90 years group. The random forest analysis indicates that variable importance in classification depends on age. Cognitive testing that targets an intermediate level of episodic memory and delayed recall, in addition to APOE4 allele count, are the most important variables in both studies. Conclusions: In the ADNI2 and AIBL samples, the associations between specific cognitive abilities and brain amyloid-beta positivity depended on age, but in general episodic memory was most consistently predictive of brain amyloid-beta positivity. Random forest methods and OOB error rates establish the feasibility of predicting the presence of brain beta-amyloid using cognitive testing, APOE4 genotyping and demographic variables.


INTRODUCTION
Novel therapies for early Alzheimer's Disease (AD) are in development which, if approved for clinical use, may increase the demand for confirmation of abnormally high levels of AD biomarkers such as brain beta-amyloid. Determination of brain beta-amyloid status using PET imaging of amyloid in the brain or CSF sampling is either expensive or invasive and therefore strategies that can increase confidence in decisions to order, or to not order such assessments will be very useful especially if they utilize information that can be obtained routinely. Amyloid identification is also of interest for AD clinical trial enrichment. Neuropsychological (NP) tests are often used to help providers diagnose AD, and therefore may also have an important role in predicting amyloid positivity. However, NP tests batteries used in AD are by design polyfactorial, in that multiple cognitive functions are measured to determine whether the nature and magnitude of any cognitive impairment observed is suggestive of dementia, and if so AD. As a result, it is difficult to link performance on a test to specific functions. Previous studies have investigated the cognitive profile associated with the presence of amyloid in AD dementia, MCI and cognitively unimpaired individuals using standard linear model approaches (1)(2)(3)(4)(5)(6)(7). Harrington et al. observed that among MCI cases, those who are amyloid positive vs. negative had greater deficit in verbal and visual memory (Hedge's G difference: 0.66 and 0.35, respectively) and attention/processing speed (Hedge's G = 0.31), but higher functioning in language (Hedge's G = −0.70) (1). However, results among cognitively unimpaired subjects have not been consistent across studies. For example, some studies did not observe statistically significant differences (1,2), whereas other studies using larger sample sizes found associations between episodic memory and amyloid (3,4,7,8). A meta-analysis of cognitively normal subjects with and without amyloid (9) found differences with small effect sizes for visuospatial function, processing speed, episodic memory, semantic memory, and executive function. Together, these findings indicate that the cognitive differences between amyloid positive and negatives in cognitively normal subjects can appear nuanced and difficult to detect.
Additionally, age of onset can also impact the cognitive profiles associated with AD (10). Patients with onset of AD at younger ages demonstrate praxis, language impairment and visuospatial problems, while older onset patients demonstrate a greater deficit in visual memory and temporal orientation (11,12). In cognitively normal older adults, a decline of episodic memory has been well-studied and reviewed in Light (13), Tromp et al. (14). Additional deficits also may be apparent in attention, inhibition, cognitive flexibility (15) and processing speed (16).
One issue with these results is that tests are often grouped into subscales, and associated with certain functions. This approach requires replication within each scale, which is difficult given the time generally required to administer cognitive tests. The polyfactorial nature of tests leads to a reduction in internal consistency, which also hampers statistical power. Critically, it also complicates interpretation of scale scores.
The objective of this study is to characterize age-stratified cognitive profiles using NP test specificity among cognitively normal and MCI subjects, with the goal of identifying which cognitive abilities are most strongly related to amyloid biomarker measurements. In turn, this will aid in identifying promising targets for cognitive assessment in clinically practical amyloid detection tools. Focus is given to samples consisting of cognitively normal and MCI, since such subjects are likely to be targets for future AD treatments. We hypothesize that tailored and abbreviated sets of cognitive tests that are in line with these targets can help improve prediction of amyloid positivity. Given age and APOE genotype are important predictors of amyloid (2,(17)(18)(19), we explore stratification of cognitive testing by these variables.
To accomplish this, we applied partially ordered sets (POSET)-based statistical classification methods to NP test results from cognitively normal and MCI subjects. These data were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database and the Australian Imaging, Biomarkers and Lifestyle Flagship Study of Aging (AIBL) database. We were able to select NP tests across the respective batteries that share the cognitive functions being assessed. This allowed for comparability of classification results even as the data sets were based on different test batteries, since the results are in relation to performance with the cognitive functions.
POSET statistical methods are classification models that "sift" through responses to polyfactorial NP measures and systematically identifies specific cognitive functions that are relatively impaired. Importantly, a theoretical statistical framework for this approach has been established (20,21). POSETs have the ability to manage the aggregate response patterns and methodically identify specific functions that are the source of poor performance. For example, a Category Fluency test involves the person being asked to say all the words they can think of within a given semantic category, e.g., "vegetables." The functions required for the test include verbal fluency, attention, and cognitive flexibility. If the individual performed badly on the test but performed well on another test, e.g., Boston Naming, which involves attention and verbal fluency, poor performance on Category Fluency could then be attributed to poor functioning in the domain of cognitive flexibility. Of course, variability in response behavior and the test score values themselves must be taken into account when assessing the strength of this evidence. This is formalized in a statistical Bayesian framework. Importantly, this systematic approach can be extended to more complex scenarios, when there are multiple functions involved within and across measures, as we will see here. Thus, POSETs have the ability to manage the aggregate response results and methodically classify cognitive profiles. POSET methods explicitly link functions and measures and provide a means for data-analytic validation of these links.
POSETs, implemented using custom written scripts in LUA programming language, have been successfully used in previous NP data studies to investigate cognitive profiles associated with progression from MCI to AD (22,23). Baseline cognitive profiles of ADNI subjects were used to determine cognitive functions related to the risk of conversion from MCI to AD within 2 years. Deficits in specific levels of episodic memory level [recall of items after distraction as in the Auditory Verbal Learning Test List B (22)], perceptual motor speed, and cognitive flexibility were found to be potentially useful cognitive predictors of conversion. The presence of an APOE4 allele also had a strong association. Longitudinal change in specific functions was considered as well. POSETs have also been used in classifying schizophrenic cognitive profiles and functional recovery in schizophrenia (24,25), and cognitive impairment patterns in low birthweight/early birth children at early grade school (26).
Overall, the objective of our POSET-based analyses was to provide insight into how cognition is associated with amyloid positivity, and to identify cognitive function targets for practical clinical decision tools to help predict amyloid presence. Focus was given to samples with primarily cognitively normal subjects, along with some MCI subjects. These samples are thus from potential screening populations. For the latter group, finding cognitive differences by amyloid status has been challenging in prior analyses (9). We then used random forest methods to assess the importance of individual cognitive tests in prediction.

ADNI and AIBL Studies
ADNI is a NIH funded study with the aim to advance understanding of AD through the use of repeated measurements over several years. The measurements include MRI, PET, clinical assessments, and NP tests. The data used in this study for the POSETs analysis were from the ADNI2 phase, and the information in the following sections describes the ADNI2 data only. For further information about the ADNI phases, see www. adni-info.org.
Similarly, AIBL was designed to investigate the factors that contribute to the development of AD. It focused largely on the recruitment of a cognitively normal and MCI population and is following them longitudinally. Data were collected by the AIBL study group, and the AIBL study methodology has been reported previously (27,28). Further information about the AIBL study can be found on their website, see www.aibl.csiro.au.
The inclusion criteria for ADNI required participants to be aged 55-90 years old, have a minimum of 6 years of education, be fluent in either English or Spanish and have no other neurological conditions. AIBL inclusion criteria required participants to be aged ≥60 years of age and have no other neurological condition or a diagnosis of cancer, diabetes, or excessive regular consumption of alcohol. Both studies follow similar criteria for the diagnosis of dementia, based on the criteria of the National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer's Disease and Related Disorders Association, and MCI based on the criteria proposed by Petersen et al. (29) [AIBL (27,30), ADNI (31, 32)].
The ADNI group focused much more on recruiting MCI and early AD than in AIBL. To enhance comparability, we only consider normal, normal with no MCI diagnosis but with subjective memory concerns (SMC), and early MCI subjects in ADNI2, and only normal and MCI subjects in AIBL. Late MCI in ADNI2 have clear cognitive deficits, so would widen the variability in performance, and could impair comparability of the samples. Within the series of ADNI studies, we selected ADNI2 due to the consistent amyloid imaging, and careful characterization of early MCI subjects. The AIBL NP tests, although different, were designed to be comparable to the ADNI NP tests. We limit NP tests to those used in the same domains in a prior ADNI analysis (22), to allow for direct comparison.

Analysis Samples
Participant data were selected from the ADNI2 databases if they had a status of cognitively normal controls, subjective memory concern but no MCI diagnosis (SMC) [see (32) for diagnostic criteria-henceforth considered with the normal controls] or early MCI; known amyloid status determined from PET imaging or CSF; APOE4 status recorded; and their NP battery test scores were available. AIBL subject data were selected for cognitively normal controls and MCI (which were not subdivided further, as in ADNI2). The demographics for each cognitive category are summarized in Tables 1A,B. The final ADNI2 sample had a total of 445 subjects consisting of 244 healthy adults (cognitively normal and subjective memory concern) and 201 MCI subjects. The group was 52% male, mean age was 71 6 years (SD = 6.5). APOE4 allele count was APOE4 = 0: 64.1%, APOE4 = 1: 31.3%, and APOE4 = 2: 4.3%. Amyloid status across the group was 40.0% positive. The final AIBL sample with amyloid imaging consisted of 210 subjects−175 healthy adults and 35 MCI subjects. The group was 50.4% male, mean age was 74.4 years (SD = 6.9), APOE4 allele count was APOE4 = 0: 67%, APOE4 = 1: 29.0%, and APOE4 = 2: 4.0%. Amyloid status across the group was 41.0% positive. Although the ADNI2 dataset contained more MCI subjects, the groups are comparable on age, gender, amyloid positivity, APOE4 allele count and cognitive status. The ADNI2 group did include a few subjects in the 55-60 years old range, and the AIBL subjects had a lower proportion of subjects with <13 years of education. Cognitive classifications were conducted for healthy (cognitively normal and SMC) and MCI subjects. Given that MCI subjects are more impaired and have higher likelihood of being amyloid positive, and since the mix of cognitively normal to MCI differs, we decided to conduct parallel analyses as opposed to pooling POSET classification results for combined analyses.

Determination of Amyloid Status
In the ADNI study, the presence of amyloid was determined by the ADNI Biomarker and PET Cores from either CSF or PET data. PET data was acquired using the tracers Florbetapir (AV45) or fluorodeoxyglucose (FDG). Full details of the ADNI CSF and PET analysis protocols and their derivation of amyloid status can be found on the ADNI website and elsewhere (33,34

Neuropsychological Data and Determination of Cognitive Functions
The ADNI2 and AIBL studies include a wide battery of NP tests. A selection of tests was chosen based on the types of cognitive functions they tested, as listed in Table 2  analyses by 3 age groups: <70 years old (y), 70-80 years, and 81-90 years. Depending on the age of the subject, means and standard deviations from tests within the corresponding age group were used to derive z-scores. These age ranges were chosen based on expert clinical opinion (author AJL) and sample size balance across both studies. Standardized norming in the CVLT II Manual also is conducted in these age groups [see (36)]. For our POSET analysis, experienced neuropsychologists developed a mapping of each NP test in ADNI2 (author JJ) and AIBL (authors JJ and PM) and the different cognitive functions being measured ( Table 2). These mappings are validated data analytically (20), as we describe below. The cognitive functions included three levels of episodic memory, verbal fluency (VF), attention (ATT), cognitive flexibility (CF), and perceptual motor speed (PS). Three levels of episodic memory were distinguished to better represent the differences between immediate recall and delayed recall that are concealed in the aggregate scores of the NP tests ( Table 3). Level 3 (EM3) is the highest level where subjects are able to recall given information at least 30 min later following a series of distractors. Level 2 (EM2) is the ability to recall information after a short duration (10 min), with distractors. For Level 1 (EM1), subjects are able to recall information immediately after receiving it but cannot recall it after a delay. These are Recall immediate term items, word recognition hierarchically related; high-level performance on level 3 implies high-level performance at levels 1 and 2. Moreover, lower-level performance at level 1 implies lower-level performance at levels 2 and 3 as well. This ordered relationship between levels reduces the number of possible profiles, so, for instance, a subject cannot be high level at EM3 and low level at EM1.

POSET Model Generation
Each NP test is associated with specific cognitive functions required to perform well on it. POSET models consist of classification states comprised of detailed profiles of cognitive functioning that reflect discrete performance levels across the range of associated functions (20,21,24,25,37). One state is considered higher than another state if its associated performance levels are at least as high for all cognitive functions as those of the lower one, and strictly greater for at least one of the functions. If neither state is higher than the other, in that each state is at a high level for a function that the other is not, then the two states are said to be incomparable. Allowing for incomparability enhances the flexibility to model response data. What is considered as high and low level is relative to within each study sample. The respective POSET models are algorithmically generated based on the cognitive specifications of the tests, as in Table 2 (38). Equivalence classes of profiles are identified, where profiles are in a same equivalence class if they cannot be statistically distinguished by the test battery. Hence, the models are identifiable, and well-defined. The model is comprised of classification states that correspond to these equivalence classes. For profiles in a same equivalence class, the functions for which they differ in the profiles are considered as undetermined. For example, in Table 2, note that cognitive flexibility is always assessed in conjunction with other cognitive functions (in addition to attention). There are thus limitations in definitively distinguishing a subject's cognitive flexibility functioning levels. Specifically, when the other functions with which it is being tested are at low levels, performance on the respective tests is expected to be poor regardless of the functioning level of cognitive flexibility. Hence, its level cannot be determined in such cases. This phenomenon is termed as confounding in classification. In the model based on Table 2 on the ADNI2 specifications, the affected states were 7, 14, 21, and 28. Note that for these states, associated profiles indicate low levels for verbal fluency and perceptual motor speed. These are the two functions that are tested with cognitive flexibility, respectively, in Category Fluency and Trails Making Test B (20,39). Hence, performance with cognitive flexibility is confounded in those situations, and its functioning level is undeterminable. In ADNI2 and AIBL, all tests were specified as involving Attention, so that bottom state profiles of 29 and 33, respectively, reflect that Attention is at a low level, and that due to confounding, the other function levels are undetermined. We assume that all functions are at low levels in these states. The AIBL state profiles do not have confounding beyond the bottom state. This is due to the larger variety of cognitive tests in the AIBL cognitive battery.
Partially ordered relationships between states allow for more flexibility and richness than linearly ordered models, and can represent the complex response patterns from NP tests. They also take advantage of replication in testing of function, as POSETs have essential statistical convergence properties such as accurately identifying a subject's cognitive profile with sufficient measurement. A main criterion for model fit involves assessing consistency in response behavior with respect to tests that are hierarchically related by the associated cognitive skill requirements. Additional information on POSET models is available elsewhere (20,21,40).
The number of POSET states represents the number of possible cognitive profiles that can be determined from the selected NP tests, shown in Tables 4A,B for ADNI2 and AIBL, respectively. The associated cognitive functions were determined from expert opinion (JJ). The corresponding POSET models are shown graphically as Hasse diagrams in Figure 1, along with the profiles associated to each state. Listed are the functions at a high level. Profiles with undetermined CF functioning levels include CF * in their respective list. Ordering of states is represented by level within a graph, with higher order states at higher levels in the graph. Direct connections between states indicate direct ordering, so that the upper connected state is at least as high a level for all functions as compared to the lower connected state, while also being at a higher level with at least one extra function. There were 29 POSET states in ADNI2 and 33 POSET states in AIBL in this analysis: the lowest state (number 29 and 33, respectively) possesses the lowest level of functioning across all cognitive functions, and the highest state (number 1 and 1) represents the highest level of functioning. Every state in between has at least one cognitive function that is at a low level. Thus, each state denotes a distinct cognitive profile. In Figures 1A,B, a line connecting two states indicates a direct ordering between a lower and higher state.
Each POSET state was assigned a uniform prior probability value of 1/29 for ADNI2 and 1/33 for AIBL to indicate the non-informative prior belief about a participant's profile. Two response distributions were then estimated for each cognitive test, representing the statistical behavior of two groups: subjects in profiles with high level functioning for all the functions associated with a task, and those who are low level functioning in at least one of the associated functions. Based on a subject's set of responses, Bayes rule was used to update his or her posterior probabilities of state membership. See, for instance (20,21), for details on how posterior probabilities are computed in Bayesian updating. A posterior probability value near one for a given state indicates that there is strong empirical evidence for the subject belonging to that state. In contrast, a value near 0 indicates that the subject likely does not belong to the state. Classification was conducted for cognitively normal and MCI subjects. Response distribution estimation of the respective tests reflect tendencies for score values among subjects in the respective populations. Non-parametric approaches were adopted, as the shapes of the response distributions appeared complex (41). We categorized response values into four groups, demarcated by sample quartiles, so that multinomial response distributions are estimated in a Bayesian, Markov Chain Monte Carlo approach (24,25). For the timed data from Trails A and B, Bayesian non-parametric density estimation using normal mixture models and Dirichlet process priors was employed (41). These estimated distributions are used in Bayes rule for computing posterior probabilities of state membership based on the relative likelihood of the observed responses and prior probabilities. Subjects will have relatively higher probabilities to perform well (e.g., upper quartile) on tests for which they have high level of functioning for all of the associated functions. Otherwise, it is expected that they perform less well with relatively higher probability.
Finally, for each subject, probabilities of being at a high level were derived for each cognitive function. These values were derived by summing posterior probability values of state membership with associated profiles that indicate high functioning with the specific function. Additional information is available in Tatsuoka (41).

Statistical Comparison
Mann-Whitney U tests were used to compare by amyloid status the respective POSET-derived probabilities of high-level functioning across the range of cognitive functions. Comparisons were conducted within successively finer stratifications by age and number of APOE4 alleles. Age was stratified into three categories, <70 years, 70 up to 80 years and 81 up to 90 years. Bonferroni corrections were applied to the calculated p-values for the number of cognitive functions. Thus, the null hypothesis was rejected if the statistical significance surpassed a threshold of p < 0.05/7 = 0.007.

Random Forests for Prediction of Amyloid Positivity With Cognitive Tests
We also developed example random forests based on respective cognitive tests in Table 2, in addition to age, gender, education (≥13 years or not), and APOE4 allele count. For ADNI2, we also include the MMSE total score (42). The objective is to assess the relative importance and utility for cognitive testing to predict amyloid status. This is done through analysis of classification error rates and variable importance measured with random forest methods. A key feature of random forests is the generation of an ensemble of classification trees based on bootstrapped samples of the data. Further, at each branch split in a tree, only a subset of randomly selected variables is considered. This helps reduce over-fitting. By classifying out-of-bag (OOB) data (about one-third of original sample) from each bootstrapped sample with the corresponding tree, an accurate estimate of tree-based classification error can be obtained without cross validation. Of great value is the measurement of variable importance in prediction by the mean decrease in accuracy (MDA) across an ensemble of trees by taking out each of the predictor variables individually from the tree fitting process, and assessing the resultant decrease in accuracy per tree. Below, ntree represents the number of trees that are fit per data set, and mtry is the number of variables randomly selected for each branch split (42). The R software package "randomForest" was used.

Cognitive Differences by Amyloid and Age
POSET model fit in both models was good, as reflected by relatively large posterior probability values on one state, and response distribution estimates that reflect the specified order structure (20). Statistical tests stratified by age group revealed differences in cognitive profiles by amyloid status in both studies. In ADNI2, for the <70 years age group, the amyloid positives (A+) performed significantly worse than the amyloid negative (A−) group at EM1, EM2, EM3, and ATT. No significant differences were found in the AIBL <70 years group at the p < 0.007 threshold. For the 70 years up to 80 years age group in ADNI2, significantly worse performance for the A+ vs. A− group was observed for EM1, EM2 and ATT. This group in AIBL demonstrated significant differences in EM2. For study subjects aged 81-90 years, in both ADNI2 and AIBL, there were not significant differences between amyloid groups for any of the cognitive functions. The associated p-values for all the cognitive functions are shown in Table 5.

APOE4 Allele Count, Age, and Amyloid
We next looked at how APOE4 and age predict amyloid status (see Table 6). In many instances, considering age and APOE4 allele count alone appears sufficient for prediction of amyloid status. For example, for APOE4 = 0 and age <70 years or 70-80 years in the ADNI2 sample, predicting amyloid status as negative would have led to accuracy levels of 81.8 and 71.8%, respectively. Also, for APOE4 = 1 and age 81-90 years, or APOE4 = 2 and age 70-80 years or 81-90 years, then respective prediction accuracies were 84.6, 75.0, and 100% for amyloid positive. For other age-APOE4 allele count groupings, accuracy is <70%. The AIBL

Exploratory Analysis of Cognitive Differences by Amyloid, Age, and Cognitive Status
We next stratify by cognitive status (normal or MCI) as well as age group and amyloid status. This extra stratification results in smaller sample sizes and number of amyloid positive subjects per subgroup, hence we view these analyses as exploratory, and do not adjust for multiple comparisons. Mann-Whitney tests were adopted, to assess for differences across cognitive functions by amyloid status. Any differences found below indicate lower functioning for the amyloid positive group. Type I error level was set to 0.05. For ADNI2 cognitively normal subjects, stratified by age group, the following cognitive functions had statistically significant differences between amyloid positive and negatives. For AIBL cognitively normal subjects, stratified by age group, there were no statistically significant differences across cognitive functions between amyloid positive and negatives. For <70 years olds, n = 51, n+ = 8; for 70-80 years olds, n = 83, n+ = 28; for 81-90 years olds, n = 27, n+ = 15. For AIBL MCI subjects, no differences were found for <70 years olds (n = 5, n+ = 3); for 70-80 years olds (n = 13, n+ = 10), trending toward significance was found for EM2 (p = 0.077); for 81-90 years olds (n = 14, n+ = 9), significant difference was found for EM1 (p = 0.042), and trends were seen for EM2 and EM3, and for ATT (p = 0.060 each).

Random Forests for Predicting Amyloid Positivity and Assessing Variable Importance in Prediction
As we saw in the previous section, APOE4 allele count on its own is often quite predictive, depending on age, but not in all scenarios. Hence, this analysis will inform how cognitive tests can augment and improve prediction. Through assessment of variable importance, we can also ascertain prediction performance when APOE4 allele count is removed as a predictor through its mean decrease in accuracy (MDA) value. Given our interest in brief clinical assessment for screening, we consider individual cognitive tests, as opposed to classification at the cognitive function level, which would generally require replication in testing (see Table 7).
Four variables were randomly selected per branch split (mtry = 4, ntree = 1,000). The random forest OOB error rates using all data in ADNI2 and AIBL are 30.88 and 27.14%.  For AIBL, the OOB error rate worsens for the older age groups. The relative ranking of variable importance and associated mean decrease in accuracy values are interesting. Note that in ADNI2, we see that APOE4 is the most important variable in prediction. In the random forest with all data, not including APOE4 allele count as a variable results in a 36% mean decrease in accuracy. This indicates that cognitive testing alone may not be as effective in overall prediction without APOE4 allele status. Note that ADAS Number Cancellation, a measure of attention, has the second highest importance, followed by age and ADAS Delayed Recall. The cognitive tests are in alignment with our POSET-based findings that differences in ATT and EM3 are statistically significant when age is <80 years. For AIBL, APOE4 allele count is by far the most important variable overall for the random forest fit with all the subjects together. Still, note that a number of episodic memory tasks also have relatively high importance (e.g., <10% mean decrease in accuracy), as well as the age variable. This includes CVLT Delayed Recall and List 1-5 tasks, and CogState One Card and One Back tasks. APOE4 allele count is important for the 70-80 and 81-90 years age groups as well, but interestingly, it is not relatively important in the <70 years age group. In that age group, CogState One Card is the most important variable, and the only one with <10% MDA. In this subgroup, note that the number of positive amyloid subjects is relatively small, which may be a factor in the low OOB error rate. For the 70-80 years random forest, note that CVLT Lists 1-5 has MDA of 16.4%. This test is associated with EM2.

DISCUSSION
We applied POSET models to NP test scores from ADNI2 and AIBL to examine performance in a range of cognitive functions and characterize cross-sectional cognitive function deficit patterns that were associated with the presence of amyloid. These results showed that specific cognitive abilities differed by amyloid status and depended on age. In general, episodic memory, particularly intermediate recall with distraction (EM2), as well as delayed and immediate recall abilities (EM3 and EM1) and attention (ATT) most consistently emerged as being associated with amyloid positivity. These differences depend on age group. In ADNI2, for subjects <70 years old, cognitive differences by amyloid group are clear for EM1-EM3 and ATT even at the strict Bonferroni-corrected threshold of p < 0.007. These differences persist for the 70-80 years group, although EM3 differences are significant only at the p < 0.05 threshold. In AIBL, there are less clear differences, with only EM2 being significant at the stricter threshold for the 70-80 years group. However, at the p < 0.05 threshold, differences arise for EM3 at all age groups, and EM1 for the 70-80 years and 81-90 years groups. At that significance level, ATT is significantly different for the 81-90 years group.
Hence, the cognitive functions with differences between amyloid groups are similar across studies. The differences do appear to arise earlier in the ADNI2 cohort. The differences are also more decisive, in terms of smaller p-values, in ADNI2. This could be due to larger sample size, and the higher level of inclusion of MCI, so that differences with cognitively normal subjects is more pronounced. Interestingly, for ADNI2, there are no differences at either significance threshold in the 81-90 years group. This could be in part due to increases in other causes of cognitive impairment with aging leading to a reduction in cognitive variability among the oldest participants.
In section APOE4 Allele Count, Age, and Amyloid, it is interesting to see how well-amyloid positivity is predicted from APOE4 allele count, in conjunction with age group, in both studies. For instance, when APOE4 = 0 and age is <70 years then amyloid status is likely negative. In contrast, when APOE4 = 1 and age is >70 years then amyloid status is likely positive. When APOE4 = 2, amyloid status appears to be decisively positive. For other APOE4/age combinations, amyloid status is less clear. These results are in line with a prior study that found that an estimated 91% of people with two APOE4 alleles develop AD, at the average age of onset of 68 years, compared with just 47% for a single APOE4 allele, with average onset of 76 years (43). Carriers of the APOE4 gene also demonstrated a higher degree of cognitive decline (44).
The random forest results indicate that reasonable performance in prediction of amyloid positivity is possible with cognitive tests, APOE4 and demographic variables. Importantly, random forest variable importance results by age group give a sense of how importance changes with age. In ADNI2, APOE4 is still most important, and delayed recall measures are second most important. However, in the age group 81-90 years, the MDA for AVTOT6, the second most important variable, is only 4.3%, which indicates that cognitive testing may not be helpful for prediction in this age subgroup. For the 81-90 years age group random forest for AIBL, OOB error rate is 40.43%, indicating that even with APOE4 genotype, cognitive tests, and demographic variables, prediction of amyloid may be difficult in the age group. In both studies, it appears that for the 81-90 years age group, cognitive tests do not have high variable importance for predicting amyloid positivity.
Overall, the POSET and random forest analyses have strong correspondence. By age group, the cognitive functions identified as differing by amyloid status are also associated with the cognitive tests found to have relatively high variable importance.
The POSET analysis provides scientific support for the selection of cognitive tests for prediction, and correspond to known cognitive sequalae in AD progression. A practical ramification of the POSET analyses is the focus at the cognitive function level, as opposed to the individual test level, as done in random forests. Although different cognitive test batteries were adopted, the cognitive test importance values are unified across studies by sharing common specifications for functions that significantly differ by amyloid status. This holds promise for flexibility in future screening in terms of utility of a range of cognitive tests that can be effective in prediction. Considerations of cost and burden of NP tests are important before implementing classification tree algorithms for practical clinical use. Time is often a limiting factor in clinics and conducting cognitive tests can be time consuming. For example, the ADAS-Delayed Recall Subscale takes roughly 30 min to complete for assessment of the EM3 function. Note that CogState memory tests, which are computerized, were also found to be important. It may thus be possible to streamline administration of tests to minimize impact on clinical flow and test burden.
We also conducted exploratory subgroup analyses by age and cognitive status in section Exploratory Analysis of Cognitive Differences by Amyloid, Age, and Cognitive Status. In many cases, sample sizes and the number of amyloid positives within the subgroup were often small. Still, there are interesting findings for cognitively normal subjects in the ADNI2 study for <70 years olds, with verbal fluency and attention being significantly different. For the MCI subjects in both studies, it appears episodic memory levels and other functions may be impacted, depending on age group. In prior studies, the differences in cognitive function in cognitively normal and MCI cohorts by amyloid presence have varied: some studies have reported no differences in the cognitive performance between healthy A+ and healthy A− groups (1,2), while other studies have reported differences in episodic memory (7,8,45). It is difficult to compare the type (level) of episodic memory deficit between these studies and ours as they generally do not break the cognitive functioning down into the levels used here. Episodic memory has generally been reported as encompassing EM1, EM2, and EM3. Using data from AIBL, Lim et al. (2) observed that the cognitively normal A+ group had a subtle lower performance across all NP tests examined, compared to their cognitively normal A− group. In Tatsuoka et al. (22), it was found that POSET values of cognitive functioning were fairly effective for predicting conversion from MCI to AD within 24 months. EM2 was found to be the most promising of all the cognitive functions, in conjunction with APOE4 status. EM3 was less effective than EM2, perhaps owing to aging confounding, as some non-converters had poor EM3 functioning as well. These findings are not inconsistent with what we have found in the current analysis.
The AIBL MCI cohort at age 70-80 years old has a higher rate of amyloid positivity (72.2%) compared to the same group in ADNI (50.0%). Although this AIBL group also has a lower percentage with education duration <13 years (38.9 v 81.1%), the random forest analysis results show only a weak association with education for inferring the presence of amyloid. The higher rate of amyloid positivity may be explained by the higher incidence of APOE4 alleles in this group−44.4 and 16.7% for 1 and 2 alleles, respectively. The contemporary group in ADNI has the following rates−34.4 and 3.3%, respectively. This discrepancy may also be due to our restriction to early MCI in the ADNI2 group, to reduce the proportion of MCI in the sample, and to select less affected subjects. The MCI subjects in AIBL were not characterized as early or late, and hence are likely more heterogeneous in terms of their MCI stage.
Limitations of this analysis include the sample size reductions that resulted from the age group stratification. The posterior probability values being analyzed were highly non-normally distributed, so non-parametric methods along with subgroup stratification (age, APOE4 count, cognitive status) was adopted rather than linear models with covariate adjustment. Also, the ADNI2 and AIBL populations are somewhat clinically and demographically narrow, which makes it difficult to draw generalizable conclusions that can be applied to other populations. The ADNI2 sample has a high proportion of participants with MCI, many years of education, and is composed of a mostly Caucasian sample. The AIBL sample has less MCI but also has relatively lower levels of education, and also is mostly Caucasian. Note that education has been found to have a relatively weak association with memory decline, and so it appears to be an important marker in both studies (46). Also in AIBL, non-amnestic MCI were included in the dataset; this is not normally associated with progression to AD and may represent different underlying pathology.
We acknowledge that both amyloid and tau could play important roles in AD pathology. The notion that amyloid pathology defines AD has remained largely intact through each successive update to the diagnostic criteria. The amyloid hypothesis predicts that the neurofibrillary tangles and other disease-associated pathologies, including synapse degeneration, hippocampal atrophy and neuroinflammation, are downstream of amyloid pathology and less disease specific. Therefore, if an individual presents with positive amyloid, the current view is that it is consistent with the AD diagnosis criteria, and tau and/or neurodegeneration markers positivity is not necessary. This is a "consensus" view broadly shared by both the NIA-AA diagnostic guidelines (2011-2018) and the International Work Group (IWG) criteria (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014). In our study, we followed the diagnosis criteria, and did not apply the tau positivity status in defining our study population. We acknowledge that the evidence of tau accumulation may help to address the heterogeneity of the study population in terms of AD pathologies. It should be noted that recent guidelines published by the NIA-AA include amyloid, tau and neurodegeneration in a recommended research framework of diagnosing AD (47). However, for these analyses, we chose to keep to the recommended clinical criteria.
Finally, we note that the random forest analysis was for illustrative purposes only and not for clinical use.

CONCLUSION
These findings give insight into how specific aspects of cognitive functioning were associated with amyloid positivity, depending on age, in different samples comprised of cognitively normal, and MCI subjects. These samples represent potential screening populations. Through POSET models of ADNI2 and AIBL NP test data, cognitive functions were identified as targets for testing to help predict brain amyloid-beta positivity. Note that this approach is a more general approach than selection of specific tests, as it suggests the possibility that different cognitive tests can be useful in prediction, as long as they tap into the same cognitive function targets. Indeed, this is what was observed across the ADNI2 and AIBL studies, which adopted different test batteries. The analyses presented here showed that cognitive testing of intermediate and delayed recall (EM2 and EM3) may be particularly useful, as well as attention (ATT). They also indicate that for older subjects (81-90 years), prediction can be more difficult, even with cognitive tests. This finding is reflected in both ADNI2 and AIBL data sets. These results inform a potential role of cognitive testing in the development of clinical screening tools that inform prediction of amyloid positivity without the use of invasive and expensive approaches such as amyloid PET. The random forest analyses across the two studies suggest that abbreviated cognitive testing that focuses on these respective targets can still lead to moderately high prediction accuracy. Future work will focus on developing efficient and practical classifiers that can be used in clinical settings.

DATA AVAILABILITY
The ADNI data that support the findings of this study are available from the ADNI repository, http://www.adni-info.org. The AIBL data that support the findings of this study are available from the AIBL repository, see www.aibl.csiro.au for further details.

ETHICS STATEMENT
The ADNI study was conducted according to Good Clinical Practice guidelines, the Declaration of Helsinki, US 21CFR Part 50-Protection of Human Subjects, and Part 56-Institutional Review Boards, and pursuant to state and federal HIPAA regulations. Each participating site obtained ethical approval from their Institutional Review Board before commencing subject enrolment. Written informed consent was obtained from all subjects and/or authorized representatives and study partners before protocol-specific procedures were carried out. The AIBL study was conducted according to the Declaration of Helsinki and ethical approval was obtained for each participating site from their institutional ethics committees prior to commencement of enrolment. All participants gave written informed consent before engaging in study protocols. Ethical approval for this study for the analysis of ADNI and AIBL anonymized data was not required.

AUTHOR CONTRIBUTIONS
SC: analysis and interpretation of data and drafting of manuscript. JJ, SB, PH, NM, WW, and PM: interpretation of data and drafting of manuscript. AE: study design and interpretation of data. YW and AL: interpretation of data. ZC: analysis and interpretation of data. CT: study design, analysis and interpretation of data, and drafting of manuscript. All authors read and approved the final manuscript.

FUNDING
This study was funded by Biogen. Authors employed by Biogen that contributed to the design or interpretation of the study include NM, WW, PH, and SB. AE was employed by Biogen at the time he contributed to the work.

ACKNOWLEDGMENTS
ADNI data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (https://www.adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). For up-to-date information, see www.adni-info.org. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012).