The largest human cognitive performance dataset reveals insights into the effects of lifestyle factors and aging

Sternberg, Daniel  A; Ballard, Kacey; Hardy, Joseph  L; Katz, Benjamin; Doraiswamy, P. Murali; Scanlon, Michael

doi:10.3389/fnhum.2013.00292

TECHNOLOGY REPORT article

Front. Hum. Neurosci., 20 June 2013

Sec. Cognitive Neuroscience

Volume 7 - 2013 | https://doi.org/10.3389/fnhum.2013.00292

The largest human cognitive performance dataset reveals insights into the effects of lifestyle factors and aging

DA
Daniel A. Sternberg ¹^*
KB
Kacey Ballard ¹
JL
Joseph L. Hardy ¹
BK
Benjamin Katz ²
PM
P. Murali Doraiswamy ³
MS
Michael Scanlon ¹

1. Lumos Labs Inc. San Francisco, CA, USA
2. Combined Program in Education and Psychology, University of Michigan Ann Arbor, MI, USA
3. Department of Psychiatry and Duke Institute for Brain Sciences, Duke University Durham, NC, USA

Article metrics

View details

Citations

57k

Views

7,2k

Downloads

Abstract

Making new breakthroughs in understanding the processes underlying human cognition may depend on the availability of very large datasets that have not historically existed in psychology and neuroscience. Lumosity is a web-based cognitive training platform that has grown to include over 600 million cognitive training task results from over 35 million individuals, comprising the largest existing dataset of human cognitive performance. As part of the Human Cognition Project, Lumosity's collaborative research program to understand the human mind, Lumos Labs researchers and external research collaborators have begun to explore this dataset in order uncover novel insights about the correlates of cognitive performance. This paper presents two preliminary demonstrations of some of the kinds of questions that can be examined with the dataset. The first example focuses on replicating known findings relating lifestyle factors to baseline cognitive performance in a demographically diverse, healthy population at a much larger scale than has previously been available. The second example examines a question that would likely be very difficult to study in laboratory-based and existing online experimental research approaches at a large scale: specifically, how learning ability for different types of cognitive tasks changes with age. We hope that these examples will provoke the imagination of researchers who are interested in collaborating to answer fundamental questions about human cognitive performance.

Introduction

While many scientific fields ranging from biology to the social sciences are being revolutionized by the availability of large datasets and exponentially increasing computational power, the dominant approach to studying human cognitive performance still involves running small numbers of participants through brief experiments in the laboratory. This approach limits the kinds of questions that can be practically studied in important ways. For one, most studies depend on a convenience sample of university undergraduates, limiting the broad applicability of findings (Heinrich et al., 2010). The need for research participants to return to the laboratory also limits the ability to study fundamental questions about the variables that influence learning over time and across the lifespan.

Understanding how demographic and lifestyle factors influence cognitive function has important health and policy implications. These questions are often difficult to examine using laboratory-based approaches because they require the experimenter to recruit sufficient numbers of participants across a wide range of demographic backgrounds. Studies of how cognitive performance changes with age tend to compare a sample of university undergraduates to older adults, and as a result can only tell us about the discrete differences between these samples. Since age varies continuously in the population, determining the rate at which performance and learning change with age across the lifespan would require studying a large number of participants across a continuous range of ages. This type of study would be prohibitively time-consuming and expensive to run in a conventional psychology laboratory. Likewise, even the largest observational or multi-center controlled clinical trials examining effects of various interventions on cognitive performance have generally consisted of no more than several thousand individuals from restricted geographic and demographic backgrounds—e.g., Whitehall II N = 10,314 (Marmot et al., 1991) Women's Health Initiative Memory Study N = 8,300 (Craig et al., 2005).

The lumosity platform and dataset

Given the limitations of conventional approaches, it is worthwhile to consider alternative methods to gathering data on human cognitive performance. With the rise of the Internet, web-based research in the behavioral sciences has become more common, particularly in studies of human cognition (Reips, 2004). While concerns remain, the potential of web-based research to recruit larger samples from a wider variety of demographic backgrounds has been widely acknowledged (Kraut et al., 2003; Birnbaum, 2004; Skitka and Sargis, 2006).

Lumosity is a web-based cognitive training platform that includes a suite of cognitive training exercises, assessments, and an integrated training system designed for the purpose of improving users' cognitive abilities. As the user base has grown rapidly over the past six years, the database of users' cognitive performance has become the largest dataset of human cognitive performance to our knowledge. As of January 23, 2013, the dataset includes 36,140,947 users representing 231 distinct ISO-3166 country codes. These users have trained on the cognitive exercises 609,017,147 times and taken online neuropsychological assessments 6,661,302 times (see Figure 1A for screenshots of the game and assessment pages).

Figure 1

In addition to engaging in training tasks and taking assessments, users voluntarily provide demographic information, including their age, gender, and level of education. They also have the opportunity to participate in a number of surveys about health, lifestyle, and real-world cognitive activities (Figure 1B). A user's location can be roughly determined from his or her IP address, which allows researchers to relate approximate geographic information to cognitive performance and to measure geographic reach (Figure 1C).

While internal research using this growing dataset has been ongoing for some time, Lumos Labs has recently begun to work with outside researchers who are also interested in analyzing cognitive performance at large scale, as one arm of the Human Cognition Project (HCP), a collaborative research program to understand the human mind. External researchers interested in analyzing de-identified portions of the dataset apply through the HCP website (http://hcp.lumosity.com). As part of the application process, researchers are asked to present a specific analysis plan. The Lumos Labs research and development team, and in some cases, external research advisors, vet proposals based on the quality of the specific analysis plan. All well-designed proposals are accepted. Lumos Labs allows researchers to publish any findings following from the accepted analysis plan without requiring further consultation with the company. At this time, the large majority of ongoing projects analyzing the Lumosity dataset are focused on basic psychological phenomena that are not directly related to validating cognitive training.

Here, we present two initial demonstrations of the power afforded by examining human cognitive performance at large scale. In the first example, we examine how cognitive performance relates to general health and lifestyle factors, based on a large survey of hundreds of thousands of users from the dataset. In the second example, we look at how task improvements change with age, and how these age-related changes differ for tasks that depend on different cognitive abilities.

Example 1: Health, lifestyle, and cognitive performance

Many lifestyle factors have been shown to influence cognitive abilities, and a cognitively active lifestyle has been linked to reduced levels of potential precursors to dementia (Landau et al., 2012) and a reduced likelihood of developing dementia (Doraiswamy, 2012). For these reasons, we were interested in whether users' initial performance correlated with their self-reported lifestyle habits. In order to examine this question, we designed a survey of health and lifestyle habits that has now been taken by millions of individuals across the world (available at: http://www.lumosity.com/surveys/brain_grade). Here, we focus on two particularly interesting questions about lifestyle habits from this survey that vary continuously in the population: sleep and alcohol consumption. These variables have been included in other surveys that also measured cognitive function (e.g., Marmot et al., 1991), and we were interested in whether the influence of these variables on performance in our user base would correspond to what has been observed in the existing literature.

Methods and materials

We obtained survey data for all users who took the health and lifestyle survey between March 2011 and January 2012. For each of these users, we also obtained their initial scores on three cognitive exercises, where available. These exercises were chosen for reliability as well as coverage: they are some of the most popular training tasks, are shown within the first few days of training, and represent distinct cognitive abilities. The three exercises are described below.

Speed Match is a one-back matching task in which users respond whether the current object matches the one previously shown. Users respond to as many trials as they can in 45 s. We used the number of correct responses the user made before the end of the task as the measure of performance.

Memory Matrix is a spatial working memory task in which users are shown a pattern of squares on a grid, and must recall which squares were present following a delay. The tasks uses a variant of a one-up one-down staircase method (Levitt, 1971) in order to find the user's memory threshold. We used this threshold as the measure of performance.

Raindrops is a speeded arithmetic calculation task in which new arithmetic problems continuously appear at the top the screen inside of raindrops. Users need to answer the problems before the raindrops reach the bottom of the screen. Once three raindrops have reached the bottom of the screen, the task ends. We used the number of correct responses made before the task ended as the measure of performance.

Results

Figure 2A provides sample sizes and demographic information from the three tasks. For each task, the relevant measure was first fit to a general linear model including age (up to 4th degree polynomial), level of education (approximate years), gender, and the interactions of these variables as predictors. In the case of Speed Match and Raindrops, where the relevant measure was the number of correct responses, the model included a Poisson link function in order to capture the distribution. The residuals returned by each model were used as the dependent measure for the further analyses.

Figure 2

The main effects of self-reported sleep and alcohol intake were measured for each task via separate multivariate linear regression models. These models revealed positive linear effects of hours of sleep for and negative quadratic effects of sleep for all three tasks (see Table 1 for model coefficients and relevant statistics). More specifically, we found that cognitive performance in all three tasks was greater for users reporting larger amounts of sleep up to 7 h per night, after which it began to decrease (Figure 2B). The models also revealed significant negative linear and negative quadratic effects of alcohol for all three tasks. Low to moderate alcohol intake was associated with better performance in all three tasks, with performance peaking at a self-reported 1 or 2 drinks per day, depending on the task (Figure 2C), and decreasing as alcohol intake increased from there. The presence of negative quadratic effects for both predictors indicated that the effects of sleep and alcohol intake on performance had an inverted U-shape.

Table 1

Variable	Speed match	Raindrops	Memory matrix
Sleep (linear)	B = 1.30	B = 1.47	B = 0.17
	t = 6.33^***	t = 3.29^*	t = 6.90^***
Sleep (quadratic)	B = −2.83	B = −8.25	B = −0.28
	t = −14.77^***	t = −19.74^***	t = −12.3^***
Alcohol (linear)	B = −1.40	B = −4.96	B = −0.14
	t = −6.36^***	t = −10.5^***	t = −5.46^***
Alcohol (quadratic)	B = −1.37	B = −2.19	B = −0.07
	t = −6.96^***	t = −5.22^***	t = −3.01^*

Model coefficients and t-statistics for the linear and quadratic effects of reported hours of sleep and alcohol intake, taken from the grand regression model.

p < 0.01,

^**p < 0.001,

***

p < 0.0001.

Discussion

The associations between sleep, alcohol intake, and cognitive function observed here are comparable to previous findings from the Whitehall II study. An analysis of Whitehall II participants also found that those who reported around 7 h of sleep showed the highest cognitive performance on a battery of psychological assessments (Ferrie et al., 2011). Another study of the same cohort found that alcohol intake reduces the likelihood of poor cognitive function (Britton et al., 2004), though this study did not observe the same reduction in cognitive performance at higher levels of consumption that we found in our analysis. One possible explanation for this difference is that Britton and colleagues focused on whether a participant's cognitive performance scored in the bottom quintile, as a measure of “poor cognitive function,” while our analysis looked at the average performance at each level of alcohol consumption. The increased scale of our dataset may have allowed us to observe this non-linearity in the dose-dependent effects of alcohol consumption. Other unobserved demographic covariates may also provide some explanation for the divergent findings, as the Whitehall II cohort is also restricted to civil service workers from the United Kingdom, while Lumosity users come from a wide range of demographic backgrounds and are located all over the world.

As these findings are correlational in nature, there may be other related but unobserved variables that explain some of the effects of alcohol consumption and sleep in our data. For example, the apparent cognitive advantage for those who report moderate alcohol intake may be in part due to increased social and cognitive engagement compared to those who report little or no alcohol consumption. Thus, while we would not want to strongly assert that the real causal effects of these variables exactly mirror our findings, these results instead provide a rough profile of the habits of individuals who tend to show higher cognitive function that can be filled in as we obtain additional health and lifestyle data. This first example should also serve as a testament to the ability to quickly obtain reliable data from a large numbers of individuals using the survey platform, as we were able to gather all of this data solely from new users who had joined the site within a 9-month period.

Example 2: Cognitive task improvements and aging

While aging researchers have discovered a great deal about how baseline performance declines with age for different cognitive abilities (Park, 1999; Salthouse, 2009) less is known about how the ability to learn different kinds of skills changes over the lifespan. Exploring this question using standard laboratory-based approaches would require recruiting a large number of participants across a wide range of ages and bringing them into the lab to perform multiple tasks many times over the course of weeks or months. Existing web-based approaches also face their own difficulties in studying learning over time. Other platforms that have recently become popular for running psychology studies on the web, such as Amazon Mechanical Turk (Buhrmester et al., 2011; Mason and Suri, 2012), are poorly suited for running the multi-session studies necessary to obtain this type of data, and even very recent work measuring cognitive performance across the lifespan at a relatively large scale has to date only examined baseline performance (Hampshire et al., 2012).

This type of data may be difficult to obtain via other web-based platforms in part because, while it is relatively simple to use small payments to individuals and/or online advertising to quickly obtain baseline cognitive performance data from a large number of individuals, there is little incentive for participants to return on a regular basis. In contrast, Lumosity users are specifically interested in cognitive training and are able to train on a large variety of cognitive tasks as often as they would like. As a result, they commonly return regularly over the course of months and years. These unique characteristics make it possible to examine how learning ability changes year by year over the lifespan, and how aging might affect learning differently across distinct cognitive abilities. As a preliminary demonstration of the ability to measure these differences in this dataset, we looked at how a user's age influences how much he or she improves over the course of the first 25 sessions of a cognitive task, and compared tasks that rely on abilities linked to fluid intelligence, such as working memory tasks, vs. those that rely more on crystallized knowledge, such as verbal fluency and basic arithmetic.