On the Digital Daily Cycles of Individuals

Humans, like almost all animals, are phase-locked to the diurnal cycle. Most of us sleep at night and are active through the day. Because we have evolved to function with this cycle, the circadian rhythm is deeply ingrained and even detectable at the biochemical level. However, within the broader day-night pattern, there are individual differences: e.g., some of us are intrinsically morning-active, while others prefer evenings. In this article, we look at digital daily cycles: circadian patterns of activity viewed through the lens of auto-recorded data of communication and online activity. We begin at the aggregate level, discuss earlier results, and illustrate differences between population-level daily rhythms in different media. Then we move on to the individual level, and show that there is a strong individual-level variation beyond averages: individuals typically have their distinctive daily pattern that persists in time. We conclude by discussing the driving forces behind these signature daily patterns, from personal traits (morningness/eveningness) to variation in activity level and external constraints, and outline possibilities for future research.


Introduction
Almost all life on Earth is affected by the planet's 24-h period of rotation. Humans are no different; the rhythms of our lives are phase-locked with the diurnal cycle. Because our bodies have evolved to cope with the external environment, we have genetic circadian pacemaker circuits that intrinsically follow a period of approximately 24 h [the circadian period length may vary from one person to another, vary by age and there are known gender differences [1,2]]. The operation of these circadian circuits manifests at various levels: biochemical, physiological, psychological, and in various markers from hormone levels to body temperature [3][4][5][6]. While our daily rhythms can be modulated by exogenous factors [e.g., decoupling alertness from the sleep/wake cycle [7]], there is a very strong endogenous component in these rhythms, as indicated by the persistence of a near-24 h rhythm in the absence of environmental cues or despite imposition of a non-24 h schedule [8,9].
Within this broader pattern, however, there are substantial inter-individual differences. Such differences are apparent in the existence of chronotypes-morning types and evening types, those who go to bed early and those who find it difficult to wake up early. The traits of morningness and eveningness correlate with distinctive temporal patterns of physiological and psychological variables, such as body temperature and efficiency. They also appear to be linked to gender as well as personality traits; in particular, studies have shown weak negative correlations of morningness with extraversion and sociability [10,11].
The daily rhythms that humans follow are visible in the digital records that are left in the wake of human online activity. Population-level and system-level daily rhythms can be observed in time variation of activity in Youtube, Twitter and Slashdot, and in frequency of edits in Wikipedia and OpenStreetMap [12][13][14][15]. They are also seen in the frequency of mobile telephone calls [16,17], and in traces of human mobility derived from mobile phone data [18][19][20]. But what do the circadian patterns displayed by activity levels in an online system actually reveal about human behavior? The behavior of an online system is determined by a number of factors: the day/night cycle, the function and purpose of the system in question (e.g., workrelated emails mostly being sent during office hours, see below), the variation of behaviors of user groups (e.g., Wikipedia edits from multiple time zones), and, importantly, variation at the individual level.
In this paper, we discuss findings regarding the daily patterns in electronic records of human communication, along with results of analyses that illustrate such patterns in four different datasets. We start at the aggregate level, studying system-level average patterns and discuss the origins of the findings. From the system level, we will move on to the level of individuals, and focus on the variation that remains hidden within systemlevel averages: individual differences reflected in persistent, distinct daily activity patterns. This part confirms that earlier findings of persistent individual differences in a mobile telephone dataset [21] are general, and that persistent, distinct daily patterns of individuals are common to different communication channels. These finding are important in two ways: one is that in order to better understand human behavior, more focus is required on individual-level behavior. Second, showing that behavior of each individual persists in time opens up several new questions to better understand the reasons behind this persistence and how and why this persistence can be perturbed. We conclude by discussing the implications of these findings, and address future research questions from large-scale analysis of sleep habits of individuals with big data to daily activity patterns as part of digital phenotypes.

Previous Work
Let us begin by discussing observations of digital daily cycles in different systems at the aggregate level, computed from digital records of communication and online activity. In every instance where the temporal variation of the activity levels in such systems is monitored, the result is a periodic pattern of activity on several time scales [22]. The longest scale is that of a calendar year, where special periods such as holidays can typically be distinguished (see, e.g., 17). Then there is a weekly cycle, where weekends typically differ from weekdays, and where there can be differences between weekdays as well [12-14, 17, 23]. Finally, there is a daily pattern which may significantly differ between different systems.
We stress that any observed system-level pattern rises out of the superposition of a multitude of individual patterns, and attributing system-level behavior to individuals would amount to an ecological fallacy. Therefore, interpreting what the system-level patterns represent remains a non-trivial task. Solving the problem of disentangling the superposition of daily patterns, however, may provide important information of the user population. A good example of this is Yasseri et al. [14], where the authors studied Wikipedia in various languages, and were able to infer the geographical spread of their editor base from the assumption that the observed edit frequency cycles are a superposition of circadian patterns on different time zones. The method is based on the argument that Wikipedias in different languages exhibit universal daily patterns, with minima and maxima at around the same time of the day (when correcting for time zones).
Temporal patterns of activity have been studied for different online platforms. For example, in Yasseri et al. [15], the authors look at differences between editing patterns on OpenStreetMap, which is a geo-wiki, for two different cities (London and Rome). Circadian patterns of edits for the two cities have been compared to each other and to that of Wikipedia edits. The authors also followed changes in the circadian rhythms for each of the two cities over several years. In ten Thij et al. [24], daily and weekly patterns of Twitter activity in different languages have been studied and it has been shown that circadian patterns emerge for tweets in all the studied languages. In Noulas et al. [25], the authors have looked at data from Foursquare and found geotemporal rhythms in activity both for weekdays and weekends.
Analysis of aggregate-level daily cycles with geospatial information has been used in the context of cities and transport. As an example, in Toole et al. [26], the authors infer dynamic land use of different parts of a city based on temporal patterns of mobile phone activity in different locations. In Ahas et al. [19], temporal data is combined with location data from mobile phones. Comparing daily rhythms for different days of the week, the authors show a significant difference in mobility of suburban commuters in city of Tallinn on weekends as compared to work days. In Louail et al. [20], the authors investigate the daily rhythms of different Spanish cities in terms of spatiotemporal patterns of mobile phone usage, and show how the structure of hotspots, places of frequent usage, allow them to distinguish between different cities. Also, in Grauwin et al. [27], the authors study rhythms of mobile phone traffic records in three global cities in three different continents (London, New York, and Hong Kong). They look at daily patterns at the city level as well as at the local scale within each city and find similarities between cities in some features as well as distinctive patterns for each city for other features. In Dong et al. [28], Call Detail Record (CDR) data for a period of 5 months from Cote d'Ivoire is used to detect unusual crowd events and gatherings.
As a more applied and non-conventional example of the analysis of daily rhythms, in May 2014 a number of different news outlets (e.g., 29) described how an elaborate campaign run by Iranian hackers on social media, targeting American officials and figures, was revealed only after analysing the temporal patterns of three years of activity. The daily and weekly activity patterns of the hackers matched precisely the activity profile of Tehran (i.e., low activity at lunch hours of Tehran local time, and little or no activity on Thursdays and Fridays which are weekend days in Iran).
Finally, let us mention that electronic records contain evidence of daily/weekly patterns that go beyond activity rates. Using network analysis [17] show that when mobile telephone calls between individuals are aggregated to form networks, the structural features of those networks differ depending on the starting time of the aggregation process. In particular, weekends Frontiers in Physics | www.frontiersin.org 2 October 2015 | Volume 3 | Article 73 differ from weekdays. It is probable that the explanation is that during weekends, communication is mainly targeted to close friends and relatives who reside within the dense core of one's egocentric network. At a smaller scale, in Aledavood et al. [21], the authors show that closest friends are frequently called in the evenings.

Results
In this work, we study three different datasets, one with calls, one with calls and text messages, and one containing email records [30]. For calls, we use the Reality Mining dataset [31], and another mobile phone dataset containing data from a small town in a European country with a population of around 8000 people, a subset of the data used in e.g., [32]. For the latter, we also study text messages. For all sets, we use 8-week slices. A summary of different sets can be found in Table 1. Preprocessing of the data is discussed in Section 5.
As the first step, we look at aggregated hourly event frequencies for each of the four different sets (Figure 1). It is clear that while the sleep/wake cycle is apparent in each set, there are also noticeable differences. Calls in the European town show a double-peaked daily curve, whereas the Reality Mining data displays no such pattern. It is possible that this is due to different conventions; students in Boston can be expected to behave differently than people in a small European town. Note that for the Reality Mining data, time zone information is not available, so we have manually shifted them such that the lowest points correspond to night and there is a possibility that this estimate is inaccurate. However, this only affects the phase of the pattern, not its shape. Interestingly, in both call datasets, the highest peak occurs on the fifth day (Friday). Also note the very low email activity level during the weekend in the email data. For email, time stamps are relative to some unknown t 0 , so the daily cycles appear shifted compared to the other datasets.
In Figure 2 we focus on the difference between daily cycles the various datasets. Here, we plot the average daily patterns in each system on the third day of the week. Since there is no exact timezone information for Reality Mining and email datasets, we identified the third day of the week by assuming that two lowactivity days correspond to the weekend. We also aligned the timelines by assuming that the lowest activity of the day occurs at 4 AM for all datasets. We then average over the third-day patterns across all 8 weeks in each set. As in Aledavood et al. [33], we find differences between the communication channels: for the small town dataset, the peak of text messages is later than that of calls. This is perhaps due to different nature of these channels; while getting calls in the late hours might not be appreciated, receiving text messages which are much less obtrusive is still acceptable.

Previous Work
In Aledavood et al. [21], two present authors investigated individual-level daily cycles in mobile phone call data from 24 individuals (12 male and 12 female) over 18 months. The data collection was performed in a setting where the participants completed high school some months after the collection began, FIGURE 2 | The daily pattern in each of the datasets, computed as an average over all Wednesdays in the data. Colors are the same as in Figure 1. We observe distinct patterns across the various data channels. Email activity is early in the day, whereas (unobtrusive) text messages peak late at night.  . The black line shows the average daily pattern for the dataset in question-and therefore is the same in each column-whereas green/red areas denote where this individual's pattern is above or below average. We observe that in almost every case, the individual patterns differ strongly from the average behavior, for example by increased calling frequency during mornings, mid-days, or evenings. and then started their first year at university, often in another city, or went to work. This design guaranteed a high turnover in their social networks [34], and provided an opportunity to study a major change in their life circumstances. Looking at individual-level daily call patterns, however, it was clear that there were persistent individual differences; each individual has their distinctive daily cycle despite social network turnover and changes in circumstances. This observation speaks in favor of intrinsic factors (such as the aforementioned chronotypes) dominating individual-level variations in daily patterns (see Section 4).

Results
Continuing the analysis of the four datasets, we first calculate for each set the daily patterns for each individual ("ego") by counting the total number of events associated with the ego at each hour of the day through the whole 8 weeks. The counts are then normalized to one for each ego to yield that person's daily activity pattern. As a reference, we also compute the average pattern over all egos from the normalized patterns. Figure 3 displays a sample of the individual-level daily patterns for each dataset. For each set, we have picked three egos to demonstrate individual differences; for each ego, their differences from the aggregated average are emphasized by red and green colors. For all datasets, we can observe clear variation between individuals. Considering the differences between the aggregate and individual daily cycles serves two purposes. While the average pattern in each dataset reveals general underlying mechanisms, the individual patterns show that each person has their own preferences for the timing of communication with others. The daily communication cycles point at variation beyond morningness and eveningness: while individuals clearly have different sleep/wake cycles, they also have their specific patterns during their wakefulness periods.
Using the same methodology as Aledavood et al. [21] in order to study whether these daily patterns for each individual are persistent and thus characteristic for the individual, we divide the 8 weeks of data into two 4-week time intervals and use the Jensen-Shannon divergence to measure self and reference distances between patterns. A detailed explanation of these calculations can be found in the Section 5. The results are shown in Figure 4. We observe an effect similar to the findings in Aledavood et al. [21]: the daily patterns of individuals tend to be more similar to themselves in consecutive time intervals as compared to daily patterns of other individuals in the same time interval. This indicates that individuals have distinct daily patterns that retain their shapes in time. In other words, Figure 4 shows that the individual differences seen in Figure 3 are not just caused by random fluctuations: were fluctuations the reason for individual differences, each individual's patterns in consecutive intervals would be equally similar or dissimilar to those of everyone else. As self-distances are on average lower, this is clearly not the case.

Discussion
Circadian rhythms have deep roots in human physiology, driven by the environment in which we live. These patterns manifest themselves in different ways at the individual and aggregate levels. There are diurnal patterns that are only visible at the Frontiers in Physics | www.frontiersin.org 4 October 2015 | Volume 3 | Article 73 FIGURE 4 | Self and reference distances for daily patterns in our datasets. Self-distance measures the distance between one individual's daily patterns in two consecutive 4-week intervals, whereas reference distances are computed between all pairs of individuals in a 4-week interval.
aggregate level in the overall frequencies of various phenomena that are rare or one-time events at the individual level: time of birth, heart attacks, suicides or committing unethical behavior [35][36][37]. To the contrary, the daily rhythms that we have focussed on here originate at the level of individuals, where they manifest as time-dependent event rates of e.g., digital communication.
What are the factors that determine an individual's daily rhythm as viewed through the lens of electronic records? The most obvious one is the sleep/wake cycle: we do not send emails or edit Wikipedia while asleep. This is known to be the central driver behind individual differences. First, individuals have different intrinsic chronotypes [morningness/eveningness tendencies [3]]. Second, the preferred duration of sleep also varies from one person to another [38]. Third, besides these intrinsic factors, external forcing such as different work schedules also have an effect on the sleep/wake cycle [39].
In addition to differences in the sleep/wake cycle, our alertness and propensity to sleep are distinct for each individual and vary throughout the day. Naturally, individuals go on average through fairly similar cycles of wakefulness and sleepiness, which may explain the qualitatively similar features of aggregate-level daily patterns across different systems. At the level of individuals, however, there are important differences, which are reflected in the observed daily patterns in digital records. As an example, a tired person might be less likely to write an important email or edit a Wikipedia article. Likewise, in addition to these intrinsic alertness cycles, one's daily schedule (work, commuting, etc.) plays a role by imposing constraints on the times when it is possible to send emails or make calls. In terms of daily patterns of telephone calls, things are more complicated, because every call involves two individuals-a caller and a recipient. When calling, one must consider social norms and the availability of the other party.
Understanding which of the factors discussed above dominate the digital daily cycles of individuals and give rise to individual differences and persistent circadian patterns is a task that requires further attention. While the persistence of daily patterns appears to indicate that the intrinsic components (chronotypes, alertness cycles) do play a major role [21], external factors should also be of importance (see, e.g., 40). Further, it will be necessary to study whether individuals bound by (strong) social ties tend to synchronize their communication and availability.
While analysing digital records at the aggregate level can provide us invaluable population-level insights and help to replace or improve traditional survey or census methods [26,41], studying the temporal fingerprints of individuals will unveil many new opportunities. As smartphones and other wearable devices are becoming ever more ubiquitous, they also increasingly provide high-velocity, high-volume data streams describing human behavior [42]. This data-collection capability makes these devices excellent tools for research, particularly within health, psychology and medicine, since smartphones allow researchers to study individual behavioral patterns ["digital phenotypes, " [43,44]] and their changes over time [45]. Monitoring an individual's digital behavioral patterns on different timescales is also an easy and inexpensive way for medical intervention, especially in the case of mental problems, where there are fewer biomarkers than for other types of disease. Data from smartphones have already been used to monitor the time evolution of different measures that are known to be indicative of behavioral changes in patients, which makes daily monitoring and early intervention possible [46][47][48]. As an example, Faurholt-Jepsen et al. [49] suggest that data from mobile phones can be used as objective measure of symptoms of bipolar disorder.
Because the sleep/wake cycle is a dominant feature of circadian patterns, Big Data describing the digital daily cycles of large numbers of individuals might prove to be highly useful for sleep research. However, obtaining an accurate picture of the sleep times of individuals requires solving several non-trivial problems. While one does not send emails when asleep, emails are not necessarily a reliable proxy for awake-time; it is possible to be awake and not send emails. In this sense inferring the actual times of sleep from electronic records is challenging. This problem is made more severe by the ubiquitous burstiness in human dynamics [32,50,51]: broadly distributed inter-event times make the times from last observation to bed time (or from wake-up to first observation) highly unpredictable. Nevertheless, we believe that this is an important direction for future research.
Finally, a particularly promising source of data comes from large dedicated cell-phone based data collection efforts, focusing on collecting multiplex (face-to-face, telecommunication, online social networks) network data in a large, densely connected populations, e.g., [52]. Data from a single communication channel can be too sparse and noisy for obtaining accurate daily patterns; here, having a multiplex dataset can provide a great advantage since one can combine information from all datachannels to form a much more comprehensive picture of the activity of each person (e.g., for studying sleeping patterns). Furthermore, if the participants of the dataset are densely connected through social ties, it is also possible to investigate the significance of and correlations between the activity patterns of close personal relations using such a dataset. Finally, a dataset of this nature may function as a kind of "rosetta stone, " helping researchers determine the biases of each electronic dataset, and allowing us to understand to which extent telecommunication data or Twitter datasets with hundreds of millions of active users can be used to study the daily cycles of individuals.

Data Filtering
We have used 8-week time slices of all datasets. Filters have been applied to remove users who are inactive or whose activity is too low for producing meaningful information on daily patterns. In Table 1, the total number of participants means the total number of users who have at least one event during the study period of 8 weeks. For plotting aggregate-level patterns (Figures 1, 2), we have used data from all participants. The column "Active users" in the table represents the number of users who have at least one event per day on average (minimum 56 events in total); these have been used for calculating average daily patterns (Figure 3). For measuring persistence of daily patterns and calculating Jensen-Shannon divergence, we used a subset of active users who have at least one event in each of the two time intervals of 4 weeks.

Self and Reference Distances
In order to quantify the level of persistence of daily patterns for individuals, we compare the daily patterns of each ego for two consecutive 4-week time intervals. For this, we use the Jensen-Shannon divergence (JSD) and measure the distance of the daily patterns viewed as two probability distributions (P 1 and P 2 ). The JSD is calculated as follows: JSD(P 1 , P 2 ) = H( 1 2 P 1 + 1 2 P 2 ) −

Author Contributions
TA, SL, and JS designed research. TA analyzed the data. TA, SL, and JS wrote the paper.