Online Food Frequency Questionnaire From the Cohort of Universities of Minas Gerais (CUME Project, Brazil): Construction, Validity, and Reproducibility

Background: The Food Frequency Questionnaire (FFQ) is usually used in epidemiological studies to assess food consumption. However, the FFQ must have good accuracy, requiring its validation and reproducibility for the target population. Thus, this study aimed to describe the construction of the online Food Frequency Questionnaire (oFFQ) used at the Cohort of Universities of Minas Gerais (CUME project, Brazil) and evaluate its validity and reproducibility. Methods: The oFFQ was answered two times in 1 year (March/August 2018—March/April 2019; n = 108 participants—reproducibility), and four 24-h dietary recalls (24hRs) were applied in two seasons of the southern hemisphere [two 24hRs in autumn (March/June 2018) and two 24hRs in winter (August/September 2018); n = 146 participants—validity]. To assess the validity and reproducibility, the intraclass correlation coefficients (ICCs) were estimated. Results: The oFFQ had 144 food items separated into eight groups (dairy products; meat and fish; cereals and legumes; fruits; vegetables; fats and oils; drinks; other foods). In assessing the validity, ICCs for energy and macronutrients were considered moderate, ranging from 0.41 (energy) to 0.59 (protein), while the ICCs for micronutrients were considered low to moderate, ranging from 0.25 (fibers) to 0.65 (vitamin B6). Regarding reproducibility assessment, ICCs for energy and all the assessed items were considered moderate to excellent, ranging from 0.60 (vegetables) to 0.91 (vitamin E and retinol). Conclusions: The self-reported oFFQ had satisfactory validity and reproducibility. So, it can be used to analyze the association between food consumption and chronic diseases in the participants of the Cohort of Universities of Minas Gerais (CUME project—Brazil).


INTRODUCTION
Food consumption evaluation has been commonly used in epidemiological studies since it correlates with health and chronic disease determinants (1,2). However, there are countless challenges related to this process, including intra-and interpersonal variabilities, interviewer and/or interviewee bias, and adherence to assessment instruments.
In recent years, face-to-face data collection has decreased in epidemiological studies worldwide (3), parallel with an increase in Internet access (4). Thus, the development of instruments in the virtual environment for data collection has become a promising trend (4), including assessing food consumption (5), mainly when large populations are involved.
Online data collection allows sending alerts to users, reducing the study load for participants, and maintaining a distance between researchers and subjects, limiting self-censorship in the interview. In addition, it facilitates processing and causes greater reliability of information due to the multimedia support and elimination of steps related to data entry or scanning of paper forms (6,7).
However, there are still few epidemiological studies in Brazil that have used online questionnaires for food consumption assessment (8,9). In this context, the present study, the Cohort of Universities of Minas Gerais (CUME project, Brazil), aimed to evaluate the relationship between the Brazilian food pattern and the nutritional transition on non-communicable diseases (NCDs) in higher education graduates of federal universities located in the state of Minas Gerais, Brazil (10).
To achieve the objective of the CUME project, an online and self-administered Food Frequency Questionnaire (oFFQ) has been used. This method demonstrates the practicality in obtaining and analyzing data, at a low cost, and the possibility of investigating food consumption over a long period (2,11,12).
On the other hand, the Food Frequency Questionnaire (FFQ) has limitations that include the use of a previously defined list of foods, dependence on the memory of previous eating habits, and difficulty in establishing the precise amount of food consumed (13). Besides, the FFQ must have good accuracy, requiring its validation for the target population; the use of data collection instruments previously validated in other studies is not recommended. In addition, the FFQ must have good reproducibility to increase the guarantee that the data provided by the interviewees are not due to chance.
Thus, we aimed to describe the construction of the oFFQ used at the CUME project and evaluate its validity and reproducibility.

CUME Project
The CUME project is an open population cohort, which has been developed with graduates from institutions of higher education in the State of Minas Gerais (Brazil), whose design, dissemination strategies, and profile of the first baseline participants were previously reported (10).
The choice of target population for the CUME study was because participants with a high level of education, in general, report data that are considered as more reliable (14).
The baseline data collection of the CUME project was carried out between March and August, 2016 (first wave), and March and August, 2018 (second wave) with alumni from Universidade Federal de Viçosa (UFV), Universidade Federal de Ouro Preto (UFOP), Universidade Federal de Lavras (UFLA), and Universidade Federal de Juiz de Fora (UFJF), and Universidade Federal de Minas Gerais (UFMG), graduated between 1994 and 2017.
The participants of the CUME project had access to the baseline questionnaire divided into two parts separately with a 1-week interval. The first part consisted of questions related to lifestyle, sociodemographic, anthropometric, biochemical, and clinical data; individual and family mentioned morbidity: use of medication, and personal examination history. In the second part, participants completed the oFFQ.

Construction of the Online Food Frequency Questionnaire
The construction of oFFQ was based on the original version of the quantitative instrument previously validated for the Brazilian population, containing a list of 135 food items to assess the association of food consumption in the last year with NCDs in epidemiological studies (15).
Then, we included food items that are significantly consumed by our target population and representatives of all the regions of Brazil (16). These food items were prato and canastra Brazilian cheeses, cottage cheese, lard, cod, chard, mate/black teas, white/green teas, commercially processed juice (light and diet), and light sugar. In addition, the nomenclature of some foods was adapted to suit better the language used in different regions of the country. Food items that indicated food brands were modified to generic names. Foods that were not common in our target population diet, such as radiche, morcilla, and keschmier, were excluded. Finally, food items such as "canned fruit juices/tetra brik/with sugar" and "sweetened artificial juices" were merged to the item "processed fruit juice (canned/box/powder), " while items such as "black coffee, " "espresso, " "cappuccino, " and "soluble coffee" were merged to the item "coffee." Food portions of the oFFQ were expressed in homemade measures commonly used by Brazilians (teaspoon, tablespoon, ladle, pinch, tong, saucer, cup, and glass) or in traditional food portions (unit, slices, and pieces) (17). Each food item had one to three serving options. In addition to the information about the food portion, the questionnaire had sections regarding the frequency of food consumption (from one to nine or more) as units of time (day, week, month, or year).
At the end of the oFFQ, more questions were added to learn about the eating habits and practices of the participants that may influence the risk or protection related to NCDs, such as number of meals per day; visible fat meat intake; addition of salt and/or sugar to ready meals; consumption of organic foods, lactose-free foods, gluten-free foods, probiotics, and prebiotics; and use of dietary supplements. In addition, explanatory notes for technical terms have been included to facilitate a better understanding when necessary.

Face and Content Validation of the Baseline Questionnaire and Pilot Studies
A validation of face and content was carried out to evaluate the baseline questionnaire of the CUME concerning comprehensiveness and complexity of understanding, relevance, applicability, clarity, success possibility, absence of bias, items not included, and extension. For this stage, five nutrition researchers from the UFV, UFMG, and the UFOP were invited to evaluate the instrument.
Moreover, two pilot studies were carried out to evaluate the data collection instrument. First, the printed version of the selfcompleted questionnaire was tested with 25 alumni from the UFV and the UFMG from different training areas. Then, the auto-filling online version of the instrument was developed in the virtual environment for data collection of the CUME project; it was also evaluated by other 26 former students from the UFV and the UFMG.
At the end of the questionnaire, participants had an open space to write some observations and suggestions that the researchers appreciated. We divided the data collection into two parts, leaving the oFFQ in the second part to facilitate its completion and increase the participant compliance. In addition, a photographic album of food portions and utensils was prepared to help estimate the portion size and complete the questionnaire.

Online Photo Album of Food Portions and Utensils
The photographic record was carried out in August 2015, at the Laboratory of Energy Metabolism and Body Composition (LAMECC) of the Department of Nutrition and Health at the UFV.
This photo album was based on food portions and utensils that were used in the oFFQ. Weights (in grams) for small, medium, and large portions were defined according to the Brazilian tables containing weights for food portions and home measures (17,18). However, due to the lack of tables for some foods, such as meats, fruits, and vegetables, these were adapted and weighted at LAMECC using a portable precision scale (BS 3000A, Bioprecisa, Curitiba, Brazil) with a 3,000 g capacity and 0.1 g sensitivity. In addition, small and medium portions were considered 50 and 75%, respectively, of the weight of the large portion (19), with a variation of up to 30%.
To elaborate the online photo album, foods were pre-prepared (cleaned, peeled, and cut) and prepared according to good food handling practices to guarantee the quality of the food and photographs. The prepared foods were divided into portions, similar to those presented in the oFFQ. All prepared foods were placed on white porcelain, with aluminum cutlery or in glasses, and immediately photographed to avoid the loss of sensory characteristics.
The photographs were taken with a standardized neutral color background, using a three-dimensional digital camera (Cyber-Shot DC-WX 100, 18.2 megapixels, Sony Brand, Manaus, Brazil). In addition, all photos were marked with a watermark using the logo of the CUME project.
A total of 42 food items and utensils were photographed individually at different angles and from different distances. In addition, 800 photographic images of food items and 160 utensils were obtained. We carefully assessed and standardized the angle and distance, selecting the photographs that allowed better detail of the portion size and the utensil.
For those food items that were not included in the online photo album, a photo of another food item with a similar portion size or of the same nature was presented [e.g., one participant consumed a glass of whole milk, they would use the photograph of the juice portions (including the glass) as a reference]. The photos were organized to provide better visibility and comparability.

Online Food Frequency Questionnaire Validation and Reproducibility
Sample and Data collection A total of 1,357 graduates answered the baseline questionnaire between March and August 2018. At the end of each week, the virtual platform developed for the CUME project automatically provided a report with the names and e-mail addresses of the participants who had completed the data collection. Subsequently, we randomly selected 150 participants and sent an invitation by email to study the validity and reproducibility of the oFFQ, informing the objectives and procedures to be used.
To the validity of the oFFQ, the 24-h dietary recall (24hRs) was used as a reference method for comparison. The participants who responded positively to the study invitation were contacted by cell phone in two different seasons of the year in the southern hemisphere to guarantee the variability of food consumption that occurs throughout the year due to climate change. In the first moment, two 24hRs were carried out on two random and alternate days of the week (from Monday to Friday) between March 20 and June 21 (autumn), 2018. In the second moment, the other two 24hRs were also carried out on two random and alternate days of the week (from Monday to Friday) between August 20 and September 23 (winter), 2018.
In both periods, the 24hRs were conducted by previously trained interviewers following the Multiple-Pass method (20). Participants were asked to report all food consumption from the day before the call, describing each meal, time, place, and then detailing food quantities. At the end of the 24hRs, a review of the information and investigation of possible unreported items was carried out. We also attached our online photo album of the oFFQ food items and utensils to the study invitation e-mail, the same one used in the virtual data collection platform of the baseline questionnaire, named in this study as oFFQ1. The interviews were written down on paper.
In the first moment, the 24hRs were applied to 150 participants. From these, four participants were excluded because they reported inconsistent energy consumption (<500 kcal/day or >6,000 kcal/day) (21), resulting in a sample of 146 participants. In the second moment, 12 participants did not respond to the interviewers after five telephone contact attempts. Thus, the 24hRs were performed only with 134 participants. For reproducibility, participants in the validity stage received an access link to the oFFQ on the virtual platform of the CUME project to fill the instrument again. As a result, between March and April 2019, 108 participants answered the oFFQ in a completely new way, referred to in this study as oFFQ2 (Figure 1).

Food Consumption Analysis
Intake of each food item from the oFFQ or the four 24hRs was transformed into daily consumption (grams or milliliters), multiplying the number of portions (one to nine) by the portion size (in grams), and then dividing it by the consumption frequency (daily: 1; weekly: 7; monthly: 30; or yearly: 365).
Caloric intake, nutrients (carbohydrates, proteins, fats, vitamins, and minerals), and other specific components (fibers, carotenoids, and sugars) were calculated according to the nutritional composition of each source food provided by the Household Budget Survey (22), with the help of Excel (version 2010) and SPSS (version 19). Furthermore, the food items were separated in two ways: (a) according to the eight food groups present in the oFFQ, which were organized according to nutritional similarity (dairy products; meat and fish; cereals and legumes; fruits; vegetables; fats and oils; beverages; other foods) and (b) according to the NOVA classification (23), which divides foods according to the degree of industrial processing into four groups: in natura/minimally processed, culinary ingredients, processed, and ultra-processed (Supplementary Table 1).

Data Analysis
The sample was characterized with the distribution of frequencies or means (SDs) of the sociodemographic (age, skin color, marital status, education level, regular work in the last 12 months), anthropometric (BMI = weight/height 2 ), and lifestyle variables [smoking (no, ex-smoker, smoker) and alcohol consumption (no, yes)].
The values of energy consumption, nutrients, food groups according to nutritional similarity and food groups according to the degree of industrial processing derived from the 24hRs were disattenuated (corrected by intra-individual variability), generating unique values for each item to participants who responded four (n = 134) or only two (n = 12) 24hRs. We used the PC-SIDE program (Department of Statistics, Iowa State University, Iowa, United States), developed by the National Research Council and Iowa State University (24, 25). Thus, each participant had a unique value, possibly using the whole sample (n = 146) to this data analysis of validation study. Moreover, consumption values were adjusted for energy intake by the residual method (26).
The means and SDs of the values of energy consumption, nutrients, food groups according to nutritional similarity, and food groups according to industrial processing degree were calculated for the estimates derived from the oFFQ1, 24hRs, and oFFQ2. For the oFFQ validity and reproducibility analyses, ICCs were calculated between the consumption values derived from the oFFQ1 and 24hRs, and from the oFFQ1 and oFFQ2, respectively. The ICCs were classified as excellent (≥0.75), moderate (≥0.40 to <0.75), and low (<0.40) (27).
The consumption of energy, nutrients, food groups according to nutritional similarity, and food groups according to industrial processing degree of all the participants estimated by the oFFQ1, 24hRs, and oFFQ2 were categorized into tertiles, evaluating the percentage of agreement between the measurements, being considered: exact (when the participants consumed the item evaluated in the same tertile when compared between oFFQ1 and 24hRs or between oFFQ1 and oFFQ2); adjacent (adjacent tertiles) and discordant (opposite tertiles).
Finally, to assess the differences between the values of energy, nutrients, and food group consumption between the oFFQ1 and the 24hRs, the method proposed by Bland and Altman (28) was used. For this, we constructed scatter plot graphs with absolute differences between the values of oFFQ1 and 24hRs (oFFQ1 -24hRs) on the y-axis and the mean of the values obtained by oFFQ1 and 24hRs [(oFFQ1 + 24hRs)/2] on the x-axis.
Moreover, we chose three nutrients with higher (vitamin B6, calcium, and vitamin D) and lower (fibers, added sugar, and sodium) ICC values compared with oFF1 and 24hRs to display the results of the Bland and Altman method of concordance analysis. All statistical analyses were conducted in the SPSS program (version 19) at a significance level of 5%.

RESULTS
Approximately 146 participants (66.4% women; 34.4 ± 8.6 years old) from CUME were included in this study ( Table 1). Most of them declared to be white, with individual incomes of up to five minimum wages and full-time jobs. Regarding lifestyle, 8.2% reported smoking, 70.5% consumed alcoholic beverages, and 78.8% practiced physical activity at least once a week. In addition, overweight (BMI ≥ 25.0 kg/m 2 ) was observed in 40.4% of the participants.
Regarding the validity of the oFFQ, most of the means of energy, nutrients, and food group consumption in the oFFQ1 were higher than those measured in the 24hRs. Overall, an agreement between the oFFQ1 and the 24hRs was moderate, with an average ICC of 0.44 and the exact + adjacent percentage agreement of 88.1%. Energy and all macronutrients also showed moderate agreement between the instruments, with the following ICCs: 0.41 for energy, 0.50 for carbohydrates, 0.51 for lipids, and 0.59 for proteins. Variations between micronutrients for ICC values were 0.28 (sodium) to 0.65 (vitamin B6). According to nutritional similarity, the ICCs ranged from 0.34 (fats and oils) to 0.62 (fruits) for food groups. Concerning the degree of industrial processing, the concordances were moderate for the ultra-processed foods (ICC = 0.60) and processed (ICC = 0.54) groups, and low for the group of in natura/minimally processed foods (ICC = 0.36) and the group of culinary ingredients (ICC = 0.36) ( Table 2).
Moreover, the concordance analysis of Bland and Altman showed that data were homogeneous (Figure 2).
Regarding reproducibility, the means of consumption of energy, nutrients, and food groups were similar. Overall, an agreement between the oFFQ1 and the oFFQ2 was excellent, with an average ICC of 0.76 and the exact + adjacent percentage agreement of 92.5%. The energy and all evaluated items showed moderate-to-excellent agreement, with ICCs ranging from 0.60 (vegetables) to 0.91 (vitamin E and retinol) ( Table 3).

DISCUSSION
In this study, we demonstrated that the oFFQ used at the CUME had moderate validity and excellent reproducibility for total energy consumption and most nutrients and food groups.
Regarding validity, compared with the 24hRs, the oFFQ presented moderate ICC values for most of the items evaluated, results that were similar to the previous national (2,29) and international (30,31) studies.
These ICC values may be influenced by the fact that most of the means of energy, nutrients, and food group consumption in the oFFQ1 were higher than those measured in the 24hRs. In this sense, 24hRs provide more detailed and less biased data than FFQ (32). In general, the FFQ overestimates food consumption (33)(34)(35).
Although the ICCs of some micronutrients and food groups were low for some of them (iron, vitamin B1, vitamin B12, meat and fish, cereals and legumes, fats and oils, other foods, fresh/minimally processed foods, and culinary ingredients), the values were very close to the limit considered as moderate (0.40); furthermore, the average of the exact + adjacent tertiles agreement was high (88.1%) with all items showing a percentage higher than 80%, a result also close to those observed in Brazilian studies on the subject (2,36). Interestingly, the mean differences by the Bland and Altman method were minor, and the data were homogeneous. These findings were consistent with those evidenced in a cohort study conducted with a sample of Brazilian middle-aged adults (2).
Low ICCs of some micronutrients and food groups may have been influenced by the following factors: (a) the FFQ consists of a set list of food items. At the same time, the 24hRs allow quantifying all foods and beverages consumed in the period before the interview. This potentially causes consumption overestimation in the FFQ due to the tendency of the participants to indicate greater food intake and rare items in their daily diet (37). The higher averages of energy consumption can reinforce this statement. Most of the nutrients and food groups were measured by the oFFQ in relation to the 24hRs; (b) we used the Multiple Pass method (20) to apply the 24hRs and minimize errors in measuring the diet. This method allowed greater detail of the foods consumed and investigation of sugar or sweetener addition to coffee, tea, milk, and juice, as well as the use of spices, sauces, and olive oil in vegetables, which may have reflected in higher consumption averages of these items in the 24hRs; (c) a higher average consumption of fiber in the oFFQ compared with the 24hRs could be explained by the intake overestimation of source foods considered socially approved, such as fruits, vegetables, and legumes (38); and (d) 24hRs is a method often used to validate the FFQ; however, it presents memory bias and errors when estimating the portion size.
In addition to the validity and reproducibility of nutrients, the present study extrapolated the assessment to food groups since individuals do not consume only isolated nutrients but meals consisting of food and nutrients (39). Therefore, validity and reproducibility by food groups have been carried out, considering only nutritional similarity as a grouping criterion (40). Still, to our knowledge, this is the first study assessing the validity and reproducibility of food groups according to the degree of industrial processing (41).
The oFFQ used at CUME proved to be valid compared with the 24hRs, showing that such a questionnaire can measure what it is intended to measure (29,42). This result is important not only for this study, but it also has an impact on the nutritional epidemiology and public health areas, since instruments for data collection in the virtual environment (Internet) are very useful and practical in countries with a large geographical extension, such as Brazil, dismissing face-to-face meetings between researchers and participants (6,7). 3 | Mean and SD of daily consumption, intraclass correlation coefficient (ICC), and percentage (%) of concordance between the online food frequency questionnaire 1 (oFFQ1) and the online food questionnaire 2 (oFFQ2) (n = 108 participants). Regarding reproducibility, when compared with the oFFQ2, the oFFQ1 presented excellent ICC values in most of the items evaluated, and these results are congruent with those observed in studies on the same subject (29,42); furthermore, in the analysis based on tertiles, the average of the exact + adjacent agreement of the comparison between the oFFQ1 and the oFFQ2 was almost perfect (92.5%), being this result similar to those evidenced in other studies (2,37).
Reproducibility is the ability of an instrument to produce similar estimates in two different moments with the same accuracy (29); therefore, the fact that the oFFQ showed to be reproducible is fundamental for the CUME project because evaluations of the diet and food intake of the participants will be carried out on different occasions over time due to its longitudinal design.
To ensure the quality of the information, some precautions were necessary for carrying out the study, such as assessment of seasonal food consumption, with data collection of the 24hRs carried out in two different seasons of the year; applying the second oFFQ in a timely manner to avoid fundamental changes in diet or remembering the answers given in the first questionnaire (43); prior training for all nutritionists with a standardized script using the Multiple-Pass method (20); sending a photo album with utensils and homemade measures to aid participants during the food survey (36). Additionally, it is an innovative study because the validity and reproducibility of food have been carried out according to industrial processing.
Nonetheless, this study also has limitations. We performed data collection of the 24hRs in two seasons of the year (winter and autumn), not considering dietary variability throughout all year. However, other validation studies also opted for the application of two 24hRs with shorter intervals (43,44) to improve study adherence. The 24hRs interview, including two non-consecutive days, is recommended in the European Food Safety Authority guidance on the European Union menu methodology for nationwide individual food consumption studies (45) and endorsed by researchers of the European Food Consumption Survey Methods group (46). Besides, this study innovated by encompassing the collection of four 24hRs in two different seasons; 12 participants (8.2%) did not respond to the two 24hRs applied in the second moment of the validation study, which could be compromising the assessment of food consumption variability of these participants; there was a loss of participants between the stages of validity and reproducibility of the oFFQ, which could reduce the power of the statistical tests. On the other hand, this fact does not seem to have occurred since all the ICCs were significant and with higher values in the reproducibility stage.
Finally, it is worth highlighting some advantages of the oFFQ method, such as low cost, simple analysis, easy application, does not modify consumption over time and can classify individuals according to their usual eating patterns and to associate them to health conditions, which makes it feasible for their use in population studies (2,47).

CONCLUSIONS
When evaluating the results presented, it is concluded that the FFQ developed by the CUME that could be completed online can be used with satisfactory validity and reproducibility to analyze the association between food consumption and NCDs in adults with a high level of education, the target population of the CUME project; however, correction factors must be applied to some nutrients and food groups in future data analysis of the project involving food consumption.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because, Data belongs to a still ongoing longitudinal project involving several academic institutions, and therefore cannot be shared just yet. Requests to access the datasets should be directed to Adriano Marçal Pimenta, adrianomp@ufmg.br.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by protocol n • 596.741-0/2013. The patients/participants provided their written informed consent to participate in this study.