Ethnicity-Specific Features of COVID-19 Among Arabs, Africans, South Asians, East Asians, and Caucasians in the United Arab Emirates

Background Dubai (United Arab Emirates; UAE) has a multi-national population which makes it exceptionally interesting study sample because of its unique demographic factors. Objective To stratify the risk factors for the multinational society of the UAE. Methods A retrospective chart review of 560 patients sequentially admitted to inpatient care with laboratory confirmed COVID-19 was conducted. We studied patients’ demographics, clinical features, laboratory results, disease severity, and outcomes. The parameters were compared across different ethnic groups using tree-based estimators to rank the ethnicity-specific disease features. We trained ML classification algorithms to build a model of ethnic specificity of COVID-19 based on clinical presentation and laboratory findings on admission. Results Out of 560 patients, 43.6% were South Asians, 26.4% Middle Easterns, 16.8% East Asians, 10.7% Caucasians, and 2.5% are under others. UAE nationals represented half of the Middle Eastern patients, and 13% of the entire cohort. Hypertension was the most common comorbidity in COVID-19 patients. Subjective complaint of fever and cough were the chief presenting symptoms. Two-thirds of the patients had either a mild disease or were asymptomatic. Only 20% of the entire cohort needed oxygen therapy, and 12% needed ICU admission. Forty patients (~7%) needed invasive ventilation and fifteen patients died (2.7%). We observed differences in disease severity among different ethnic groups. Caucasian or East-Asian COVID-19 patients tended to have a more severe disease despite a lower risk profile. In contrast to this, Middle Eastern COVID-19 patients had a higher risk factor profile, but they did not differ markedly in disease severity from the other ethnic groups. There was no noticeable difference between the Middle Eastern subethnicities—Arabs and Africans—in disease severity (p = 0.81). However, there were disparities in the SOFA score, D-dimer (p = 0.015), fibrinogen (p = 0.007), and background diseases (hypertension, p = 0.003; diabetes and smoking, p = 0.045) between the subethnicities. Conclusion We observed variations in disease severity among different ethnic groups. The high accuracy (average AUC = 0.9586) of the ethnicity classification model based on the laboratory and clinical findings suggests the presence of ethnic-specific disease features. Larger studies are needed to explore the role of ethnicity in COVID-19 disease features.


INTRODUCTION
During the pandemic, the impact of coronavirus disease 2019 (COVID-19) on society varied considerably from country to country. To compare different nations, researchers estimated the impact with case fatality ratio (CFR). In the middle of 2020, the CFR was 3.7% in mainland China, 15.1% in the UK, and 14.2% in Italy (Grasselli et al., 2020;Mortality Analysis, 2020;Yang et al., 2020). Many factors may account for the difference in case fatality ratio across world regions, e.g., population density and settlement, the proportion of the elderly in the society, the affordability and accessibility of national healthcare systems, and the ethnic background, which implies genetic variation.
Studies on ethnic disparities of COVID-19 are challenging since there is no well-defined concept of ethnicity. On the one hand, this term refers to self-identification of people with a particular cultural group based on customs, norms, and ideologies. On the other hand, ethnicity is the cultural and genetic heritage of the person's ancestors. The country of birth may inappropriately identify ethnicity because of the global tendency towards migration (Clarke et al., 2008). Another valid biological category for medical studies is race. Hypothetically, there is an association between genes that determine race and health. This does not comply with the data that genetic variations within races are more pronounced than between them (Egede, 2006).
Several studies and systematic reviews tried to explore whether ethnicity was a risk factor for severe COVID-19 disease form. However, the relationship between ethnicity and severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) infection remains uncertain . There are some inconsistencies in the findings on the association between ethnicity and clinical outcomes including hospitalization . However, a common limitation of such studies is that they focus on the West European or North American communities and consider other ethnicities as minorities. Though important for clinical risk stratification and proper patient management, such data are limited for multinational communities of the Gulf region with only few papers providing information on this issue (Almazeedi et al., 2020;Hannawi et al., 2021).
There is ambiguity regarding factors that account for the dissimilarities in COVID-19. Some authors show importance of disparities in the amount and location of adipose tissue whereas other authors point out a role of socioeconomic disparity (e.g., food insecurity and involvement in high-risk frontline jobs) (Islam et al., 2020;Krams et al., 2020). Environmental factors can also contribute to the diversity in the course of SARS-CoV-2 infection among people of different origins. The community of Dubai Emirate (United Arab Emirates; UAE) is an exceptionally interesting study sample because of its unique set of environmental, population, and economic factors. There is a pressing need to investigate these aspects of COVID-19 because this will contribute to proper risk management and foster further development of the community medicine .

Studies on Ethnicity-Related Dissimilarities of COVID-19 in the Middle East
Commonly, researchers from the Middle East compare ethnicity subgroups by such aspects as migration during the pandemic and mental state. They do not explore an association between COVID-19 severity and outcomes with the patient's race or ethnicity. Some studies provide descriptive statistics on ethnicity and citizenship of the participants (Al-Rifai et al., 2021;Hannawi et al., 2021) including pediatric patients (Elghoudi et al., 2020;Al-Rifai et al., 2021). Few of them categorize patients into Arabs and Asians to analyze their racial difference in disease severity and outcomes with adjustment for comorbidities Deeb et al., 2021). Unique research of antibody titers and epitope coverage is carried out predominantly on Caucasians and South Asians with few Middle Eastern patients included (Smith et al., 2021). One study in Israel compared the impact of COVID-19 on Jewish and Arab populations (Haklai et al., 2021). Table 1 and both Figures 1 and 2 summarize study cohorts of 50 open-access papers that reported original findings on the dissimilarities of COVID-19 among ethnic groups. The papers were retrieved consecutively from the Google Scholar search engine with the query comprising the following keywords: "race, ethnicity, COVID-19". In this paper, we want to characterize research methodology traditionally used in such studies and discuss some of their limitations. The majority of the studies related to ethnicity and COVID-19 outcomes were conducted by scientists from either the USA or the UK (Bassett et al., 2020;Hsu et al., 2020;Karaca-Mandic et al., 2021;Lundon et al., 2020;Rentsch et al., 2020;Gianfrancesco et al., 2021;Kabarriti et al., 2020). The US population is traditionally divided into Hispanic, non-Hispanic Whites, non-Hispanic Blacks, Asians, and certain minorities including Alaska Natives. The above-mentioned studies did not include Arabs as a separate ethnic entity. Many studies have been devoted to an association between mortality from COVID-19 and race (Bassett et al., 2020;Hsu et al., 2020;Kabarriti et al., 2020;Lundon et al., 2020;Gold et al., 2020;Golestaneh et al., 2020;Gianfrancesco et al., 2021;Price-Haywood et al., 2020;Rentsch et al., 2020;Rossen et al., 2020), an effect of ethnicity on hospitalization (Hsu et al., 2020;Lundon et al., 2020;Price-Haywood et al., 2020;Gianfrancesco et al., 2021), admission to intensive care unit (ICU) (Hsu et al., 2020), and the need for mechanical lung ventilation (Hsu et al., 2020;Lundon et al., 2020). Few studies explain the influence of ethnicity on COVID-19 outcomes in a socio-demographic context (Lundon et al., 2020;Raifman and Raifman, 2020;Millett et al., 2020). Some scientists from North America compared the mortality rate among people of different origin before and after the COVID-19 pandemic (Golestaneh et al., 2020). Authors compared laboratory findings among ethnic subgroups of patients with COVID-19 (Price-Haywood et al., 2020). Researchers adjust the study subgroups with regard to comorbidities and background conditions (Lundon et al., 2020;Golestaneh et al., 2020;Gianfrancesco et al., 2021).

Studies on Ethnic Features of COVID-19 Across the World
For epidemiological analysis, scientists from the UK aggregate data from the UK Biobank (Niedzwiedz et al., 2020), National Healthcare Service registry (Aldridge et al., 2020), Open SAFELY platform (Collaborative, 2021), and a few hospitals (Sapey et al., 2020;Apea et al., 2021). The specialists compare Blacks, Whites, South and East Asians, and minority ethnic groups, and they do not study peculiar features of COVID-19 among Arabs as a separate ethnic group (Alaa et al., 2020;Aldridge et al., 2020;Harrison et al., 2020;Hull et al., 2020;Niedzwiedz et al., 2020;Patel et al., 2020;Sapey et al., 2020;Apea et al., 2021;Collaborative, 2021;Nafilyan et al., 2021). The research objectives of the present study are as follows: investigate the relationship between race and incidence of SARS-CoV-2 infection (Niedzwiedz et al., 2020;Collaborative, 2021), disease outcomes (Harrison et al., 2020;Patel et al., 2020;Alaa et al., 2020), and both hospitalization and ICU admission (Apea et al., 2021). To adjust the supposed risk factors, various authors stratify the study samples by age, body mass index (BMI), comorbidities (Harrison et al., 2020), the number of people in the household, the accommodation type, income, and other factors (Nafilyan et al., 2021). While the majority of researchers aggregate data on epidemiology and patient management, few of them integrate ethnic-specific patterns of COVID-19 with laboratory findings, e.g., biochemical data of positively tested individuals (Apea et al., 2021).

The Timeline of the Spread of COVID-19 in the UAE
The United Arab Emirates was among the first countries to implement strict measures to control the pandemic, e.g., closing the national borders, limiting internal public movements and gatherings, shutting down schools, and the use of distant learning, and implementing work from home protocols. These measures smoothed the peak of the disease incidence and allowed the healthcare system to re-organize hospitals to effectively manage the case load and the COVID-19 outbreak, creating multiple field hospitals that can handle mild cases and converting several hotels to isolation facilities run by healthcare staff. On January 29 2020, the UAE announced its first confirmed case of COVID-19. It was the first country in the Middle East to register a case of COVID-19. The first patient was a Chinese tourist who arrived in the UAE from Wuhan on January 16. By the end of January, there had been five confirmed cases of COVID-19 in the UAE. On January 30, WHO declared the novel coronavirus outbreak a PHEIC (Public Health Emergency of International Concern) (Tayoun et al., 2020). Figure 3 shows the major steps taken by the UAE government to limit the spread of COVID-19.
The UAE population has increased substantially in the last 2 decades as a result of the remarkable growth in its economy. The economic growth has led to the influx of expatriate workers from all over the world. This accounts for the uneven distribution of age (more than two-thirds of the population is between 25 and 45 years of age) and the heterogeneous ethnic backgrounds. According to Dubai Statistics Center for 2019, the Dubai population was 3.3 million individuals, which constitutes 34% of the whole UAE population. Only 1.2% residents are older than 65 years of age Yang et al., 2020;Zhou et al.,

OBJECTIVES
We intended to stratify the risk factors for the multinational society of Dubai. To address this objective, we had the following specific tasks: • Explore the ethnic-specific features of COVID-19 by analyzing the disease course in people of different ethnic groups. • Rank the most significant features that account for the ethnicspecific course of COVID-19 in the Dubai population. • Build a model of ethnic specificity of COVID-19 based on the clinical presentation and laboratory findings on admission and evaluate its performance.

Study Sample
We obtained retrospective data routinely collected as part of standard primary and secondary care. The study sample comprised all COVID-19 patients consecutively admitted to Mediclinic Parkview Hospital, Dubai (UAE), from the date of the first confirmed case, February, 26, 2020, until May, 31, 2020 At the beginning of the pandemic, all the patients with COVID-19 verified by reverse-transcriptase polymerase chain reaction (PCR) were hospitalized to Mediclinic regardless of the disease severity or medical insurance coverage. This made our study cohort representative of the entire Dubai population. The cohort included many asymptomatic and mild cases. A thorough description of the flowchart for the management of COVID-19 patients is given in our recent paper (Statsenko et al., 2021). See the patient management with a flow chart in our recent paper (Statsenko et al., 2021). Details of the current study are provided below. The inclusion criteria were as follows (1): aged 18 years old or above (2); positive SARS-CoV-2 real-time PCR from a nasopharyngeal swab; and (3) inpatient admission. Patients meeting the inclusion criteria were followed until discharge. The national guidelines regulated inpatient management. As a part of the standard of care, baseline blood tests and inflammatory markers were obtained. Multiplex PCR assays were used to test respiratory samples for influenza and other respiratory viruses. Supportive oxygen therapy was initiated if oxygen saturation measured with pulse oximeter dropped below 94% or respiratory rate increased above 30 breaths per minute. Patients who were clinically suspected of having superimposed bacterial pneumonia were administered empirical broadspectrum antibiotics at the discretion of the treating physician. The antiviral and antimalarial therapies were guided by "National Guidelines for Clinical Management and Treatment of COVID-19", a standardized guideline for all health sectors in the UAE (National Emergency Crisis and Disasters Management Authority, 2020).
It was a single-center study with a relatively short duration (3 months) as the UAE government standardized healthcare service for COVID-19 patients during this period in the following way. All the patients with a positive PCR test were hospitalized to either government-funded or private health facilities even if they were asymptomatic. The diagnostics and treatment of COVID-19 were provided free of charge in accordance with the National Guidelines (National Emergency Crisis and Disasters Management Authority, 2020). Because of this, the sample of the study is representative for the adult population of the country. Dubai Mediclinic is affiliated with the governmental medical school-Mohammed Bin Rashid University of Medicine and Health Sciences-and is the optimal center for medical teaching and research in Dubai.

Data Collection
We extracted the electronic health records of all consecutive patients with either admission or discharge diagnosis of COVID-19, coronavirus infection, unspecified, SARS-associated coronavirus as the cause of diseases classified elsewhere, other coronavirus as the cause of diseases classified elsewhere, and pneumonia due to SARS-associated coronavirus (ICD10 codes U07.1, B34.2, B97.21, B97.29, and J12.81). Patients with negative SARS-CoV-2 PCR were excluded from the study. The information from electronic health records was extracted using a standardized data collection form by trained researchers. The form was adapted from ISARIC Rapid Case Record Form (COVID-19 CASE RECORD, 2020), see Subsection 3.3.
Variables of interest were manually extracted from electronic health records. The abstraction team of 7 physicians was trained and supervised by the principal investigator. To ensure the accuracy of the data entered, the team worked in the following manner. A team member extracted the required information to a data collection sheet. Another physician double-checked the information from the data collection sheet before entering it to an Excel spreadsheet. Any discrepancies were resolved by the supervisor.

Dataset Description
In our study, we classified the subjects into the following ethnic groups: Middle Easterns (Arabs of both the UAE and non-UAE nationality, Africans from the countries mentioned in Table 2), South Asians (patients from Afghanistan, Bangladesh, Bhutan, Maldives, Nepal, India, Pakistan, and Sri Lanka), East Asians (patients from China, Hong Kong, Philippines, Taiwan, Japan, Mongolia, North Korea and South Korea), Caucasians, and others.
To determine the case severity, we used the UAE National Guideline for Clinical Management and Treatment of COVID-19 (National Emergency Crisis and Disasters Management Authority, 2020): • Asymptomatic form refers to a patient with no symptoms.
• Mild form-clinical symptoms of upper respiratory tract infection and no signs of pneumonia.
• Moderate form-fever and respiratory symptoms with radiological findings of pneumonia. • Severe form-any one of the following criteria: respiratory distress (respiratory rate > 30/min), oxygen saturation <93% at rest, P/F ratio of less than 300. • Critical form-any of the following criteria: acute respiratory distress syndrome (ARDS) with P/F ratio < 200, sepsis, multiorgan failure, altered level of consciousness (GCS<13).
The list of variables comprising the complete cohort dataset is as follows: • Demographics features. Patient's age, gender, ethnicity, weight, height, body mass index (BMI), and occupation (for the patients who are healthcare or laboratory workers), travel history within 14 days prior to symptom onset, and exposure to a confirmed case of COVID-19. • Comorbidities. History of chronic cardiac disease, hypertension, chronic lung disease, asthma, chronic kidney disease, diabetes, active malignant cancer, immunosuppressed state, human immunodeficiency virus infection (HIV), active smoking, and pregnancy status in female patients. Medication history: whether the patient was taking any of the following medications prior to the admission: angiotensin-converting enzyme inhibitors (ACE-I), angiotensin II receptor blockers (ARB), and/or non-steroidal anti-inflammatory drugs. • Symptoms at presentation. Cough, sputum production, sore throat, chest pain, shortness of breath (SOB), fever, headache, confusion, nausea or vomiting, diarrhea, myalgia, malaise, and loss of smell or taste. • Vital signs. Temperature, heart rate (HR BPM), respiratory rate, systolic (SBP) and diastolic blood pressure (DBP), SpO2, and sequential organ failure assessment (SOFA) score at the time of admission and at the time of transfer to ICU if applicable. • Laboratory findings. The following parameters were collected on admission and at the peak of illness: white blood cell (WBC) count, lymphocyte count, platelet count,  activated partial thromboplastin time (APTT), the activity of the enzymes-alanine aminotransferase (ALT), aspartate aminotransferase (AST), lactate dehydrogenase (LDH), creatine kinase (CK), and the concentration of the total bilirubin, D-dimer, creatinine, C-reactive protein (CRP), sodium ion, troponin, ferritin, and fibrinogen. Blood hemoglobin and serum sodium were recorded on admission. • Case management and clinical course. Medications used: antiviral medication, azithromycin, other intravenous antibiotics, antimalarial, antifungal medication, IL-6 blocker "Tocilizumab", convalescent plasma, steroids (either intravenous or oral), low-molecular-weight heparins, supplemental oxygen, invasive ventilation, vasopressors, and extracorporeal membrane oxygenation. The length of hospital stay, the duration between symptom onset and admission (in days), the duration between the first positive SARS-CoV-2 PCR and the first negative set (first of 2 consecutive negative PCRs), need for ICU care, and the duration of stay in ICU. • Complications.
• Septic shock was defined according to the 2016 Third International Consensus Definition for Sepsis and Septic Shock (Singer et al., 2016). • Bacterial pneumonia was diagnosed when patients showed clinical symptoms or signs of pneumonia and a positive culture of a new pathogen was obtained from lower respiratory tract specimens (qualified sputum, endotracheal aspirate, or bronchoalveolar lavage fluid) after admission. • Bacteremia was diagnosed when patients showed clinical symptoms or signs of systemic infection and one or more positive blood culture that was not thought to be a contaminant. • ARDS was diagnosed according to the Berlin definition (Force et al., 2012).
• Acute cardiac injury was diagnosed if serum levels of cardiac biomarkers (troponin I) were above the 99th percentile upper reference limit, or if new abnormalities were shown in echocardiography. • Acute kidney injury was defined according to Kidney Disease Improving Global Outcomes (KDIGO) (Khwaja, 2012). It is based on the highest serum creatinine level and urine output. Specifically, the diagnosis could be made if there is an increase in serum creatinine levels by 0.3 mg/dl or greater (26.5 µmol/ L or greater) within 48 h. • Liver injury was diagnosed if there was an increase in liver enzymes (AST or ALT) over 3 times the upper limit of normal. • Seizure, meningitis, or encephalitis confirmed by CSF analysis and culture, cardiac arrhythmia, cardiac arrest, myocarditis (if clearly documented by a cardiologist or an intensivist), new onset cardiomyopathy (if the baseline cardiac function is unknown, assume new), critical illness myopathy or neuropathy (if documented by an attending physician or diagnosed with the electrophysiologic testing), bleeding or disseminated intravascular coagulation (DIC), the use of renal replacement therapy, the development of pressure ulcer, and other complications. • Primary Outcome. Discharged alive.

Statistical Analysis
The data were checked for accuracy and then for normality using the Shapiro-Wilk test; none of the attributes were normally distributed; the non-parametric tests were used to compare each pair of independent samples. The bivariate relationships between the features were assessed with the Mann-Whitney U test or Kruskal-Wallis test for the continuous variables, and with Fisher's Exact test or Chi-square test for the quantitative ones.  As we intended to find features inherent to the specific ethnic group, we also evaluated the differences between each group versus the others. Machine learning (ML) classification model. We utilized ML algorithms to check if there were unique patterns within the data that can unambiguously identify the ethnic group (Middle Eastern, South Asian, East Asian, Caucasians, and Others). In our dataset, the ethnic group "Others" was in the minority (14 patients, 2.5%), so we excluded it from the analysis.
The list of variables used to build the model was as follows: • Physical examination on admission: temperature, HR BPM, SBP, DBP, the time elapsed between two successive R-waves on the electrocardiogram (RR/min), oxygen saturation (SpO 2 ), SpO 2 on room air vs. oxygen therapy, Glasgow coma scale (GCS), and SOFA score. • Symptoms on admission: cough, sputum, sore throat, chest pain, SOB, fever, headache, confusion, having any gastrointestinal symptom (e.g., nausea, vomiting, diarrhea), myalgia, malaise, and loss of smell or taste. • Laboratory findings on admission: the count of platelets, WBC, and fractions of leukocytes; the concentration of hemoglobin, total bilirubin, D-dimer, creatinine, sodium, CRP, troponin, ferritin, and fibrinogen; the activity of ALT, AST, CK, and LDH; and the length of APTT.
Feature selection. To assess the importance of the features fed to the ML models as classifiers by ethnicity, we employed four ensemble tree-based estimators such as AdaBoost, Gradient Boosting, Random Forest, and Extra Trees. These models were trained on the whole dataset and used to rank the features in ascending order concerning their predictive potential.

The Cross-Ethnic Groups
Out of 560 patients, 43.8% were South Asians, 26.4% were from the Middle East, 16.8% were East Asians, 10.7% were Caucasians, and 2.50% are under Others (see Tables 2, 3). The UAE nationals represented half of the Middle Eastern patients, i.e., 13% of the entire cohort. Overall, males accounted for two-thirds of the study population, which remained true across different ethnic groups except for the East Asians where the gender distribution was almost equal. Table 2 lists nationalities in the Middle Eastern ethnic group. The comparison of the patients of the Middle Eastern subethnicities-Arabs and Africans-is given in Table 4. There were marked differences in the SOFA score, the level of Ddimer (p = 0.015), and fibrinogen (p = 0.007) between the subethnic groups on admission. The pronounced disparity between the Arabs and Africans in the background diseases (hypertension -p = 0.003; diabetes and current smoking -p = 0.015) may account for the mentioned differences. We did not find a noticeable difference between the subethnic groups in disease severity (p = 0.81).
Comorbidities. Hypertension is the most common comorbidity, present in 20.54% of the study cohort, with no remarkable differences between groups (p = 0.345). Diabetes was present in 17.50% of patients, and its incidence differed significantly among ethnic cohorts (p = 0.0001). The Middle Eastern population had the highest proportion (25%) of patients with diabetes and the prevalence was higher than in the other ethnic groups (p = 0.005). In comparison, East Asians and Caucasians had a substantially lower proportion of patients with diabetes (8.51%, p = 0.01 and 1.67%, p = 0.006 consecutively). Active smoking was present in 6.4% cases and almost half of them were Middle Easterns.
Symptoms. Each patient had between two and four symptoms on admission. The most common symptoms were fever (58.04%), followed by cough (53.93%), myalgia (38.93%), sore throat (30%), and shortness of breath (26.96%). The frequency of the symptoms and the values of the body temperature (p = 0.034), pulse (p = 0.001), and respiratory rate (p = 0.039) varied among the ethnic groups. However, the distinction in the major results of the physical examination did not have a clear clinical value. On average, the SOFA score was approximately equal in the ethnic groups on admission (p = 0.273).
Physical examination. Middle Eastern patients had a considerably higher average BMI compared to other ethnicities (p = 0.033).
Laboratory findings. APTT was longer in East Asians and reached almost the upper limit of the reference range on admission (39.3 ± 4.96 s; p = 0.015). Fibrinogen concentration was also the highest in this ethnic group (p = 0.0045). East Asians had the highest group level of the LDH activity (p = 6.77e 05), which is a non-specific biomarker of a massive tissue breakdown and a predictor of mortality in COVID-19 patients (Yan et al., 2020). Besides this, East Asians had the highest thrombocyte count (p = 4.42e -06) and a minimal lymphocyte-to-C-reactive protein ratio (LCR) (p = 0.04) both serving as laboratory highrisk complication markers. On admission, the mean count of leucocytes, neutrophils, lymphocytes, and thrombocytes of the ethnic groups were within the reference range. However, the maximal numbers of WBC and neutrophils were noted in the group of South Asian patients (p < 0.011). Patients of Middle Eastern ethnicity had considerably lower count of thrombocytes (207.0 ± 72.21, p = 0.046), WBC (5.2 ± 3.78, 4.58e -05), and neutrophils on admission (2.94 ± 2.43, p = 1.88e -05). In this group, the percentage of people with moderate and severe neutropenia (<1.0×10 9 /L) was distinctly higher than in the other groups (5.41%, p = 0.008). This accounted for the lowest neutrophil-to-lymphocyte ratio (NLR) in the Middle Eastern patients (1.89 ± 2.66, p = 0.001). The tendency remained the same at the peak of the disease.
Disease severity. Almost two-thirds of our cohort (61.25%) were asymptomatic or had mild symptoms. Moderate-to-severe disease was seen in 30.54% of the cohort, and 8.21% were critical. There was a marked disparity in the distribution of patients from distinct severity levels in the ethnic groups (p < 0.0005). In Caucasians, the portion of patients diagnosed with moderate-tosevere disease was higher than in South Asians and Middle Easterns (38.98% vs. 22.22% and 30.41%) despite the least number of comorbidities in the Caucasian group. This was also noticeable in East Asian patients. On the contrary, patients from the Middle East had a higher number of comorbidities (chronic cardiac disease 7%, diabetes 25%, smoking 10%), and essentially higher mean BMI (see above), yet they had a much lower proportion of patients with critical disease course-4.73% vs. 11.93% in South Asians, and 9.38% in East Asians.
Disease outcome. There was no marked difference between ethnic groups in primary outcome of COVID-19 (p = 0.147). The overall mortality was 2.68%. Twenty percent of the total cohort required oxygen supplementation and a lower proportion of patients from the Middle East required ICU admission compared to the other groups (8.11% vs. 12.86% in the overall sample; p = 0.044). The rate of complications was similar in different ethnic groups except for liver dysfunction which was observed in a higher proportion of East Asian patients (17.02% vs. 9.64% in the total cohort; p = 0.008). Table 5 and Figure 4 display the values of impurity-based attribute ranked averaged across four tree-based ML classifiers (Random Forest, AdaBoost, Gradient Boosting, and ExtraTrees).

Classification Concerning Ethnicity With Neural Network
To evaluate the classifier output quality, we trained several ML classification models using a stratified 10-fold cross-validation technique to generalize the models to the true rate error. For each fold, we used 90% of the data to train the model and then tested it on the remaining 10%. The decision matrices built on the test dataset for all folds were combined and used to calculate the performance metrics. The best performance measures were obtained with a three-layer fully connected neural network (NN). Figure 5 depicts receiver operating characteristics (ROC) for multi-class classification model. To generalize the area under the ROC curve (AUC) for the multi-classification problem, the average AUC of all possible pairwise combinations of groups was computed, and then unweighted mean was considered as a metric. In the figure, we also present micro-average (aggregates the contributions of all classes to calculate the metric) and macro-average (computes the metric independently for each class and then takes the average) AUCs. Table 6 lists the confusion matrix of the trained model for each group, indicating true-positive, true-negative, false-positive, and false-negative numbers. Each row of the error matrix represents the actual class, while each column shows the instances in a predicted class. Precision, recall, harmonic score, accuracy, macro average (unweighted mean per class), and weighted average (support-weighted mean per group) of the classification performance are specified in Table 7.

The Comparison of the Ethnic Groups
An ethnic group is a group of people whose members identify with one another through common cultural heritage. This term usually reflects a shared culture and social behavior; however, it can also be used to imply variations in genetic makeup between different groups. Several studies showed remarkable ethnicityrelated differences in clinical features of various diseases. For example, deaths for Hepatitis C were higher among Native Americans and Blacks compared to Caucasians. Studies demonstrated that an immunologic basis can explain this difference (Sugimoto et al., 2003). Moreover, during the 2009/ 2010 influenza pandemic, differences in mortality rates were observed, where non-Caucasians had a markedly higher mortality rate compared to Caucasians (Zhao et al., 2015).

Genetic Factors
Genetic factors may account for ethnic disparities. However, the variation in genes within ethnic groups can also be high, especially in Arabs, unified by the Arabic language rather than a common origin. The ethnic group primarily inhabits 22 member states of the Arab League in Western Asia and North Africa (Frishkopf, 2010). Though the majority of the North Africans speak Arabic as their native language, they have a Berber (not Arab) origin. Reasonably, there are genetic disparities among Arab nationalities despite cultural, geographic, and linguistic similarities among them. To study Arabian genealogy, geneticists analyze Y-DNA haplogroup tree (Mahal and Matsoukas, 2018). With this method, they showed that the haplotype of Jordanian bedouins had its traces in Palestinians, Yemeni, Moroccan, Libyans, and Tunisians. There was a low genetic diversity among these subethnicities (Almahasneh et al., 2018). Another study justified genealogical relatedness between Iraqi and Kuwaiti individuals. A nonsignificant genetic distance was shown for the following ethnic pairs: Northern Iraqi and Lebanese, Kurdish and Iranian, Iraqi and Iranian (Dogan et al., 2017).  Analogously, the genetic background of the UAE citizens was influenced by the neighboring countries and remote geographic regions. In males, Y haplogroups had similarities with the Middle Eastern, Central, and South Asian genes. Fifty-two percent of Emirati men had the Middle Eastern haplogroup J, 21% of them inherited the E haplogroup common in West and East Africa, and 14% of the individuals had the R haplogroup originated from Central and South Asia and Eastern Europe (Daw Elbait et al., 2021). A close genetic distance between inhabitants of distinct Arab countries denoted that they had a common genetic background. This enabled us to analyze data for Arabs without dividing them into subethnicities in the current study. To identify a genetic background of each individual accurately, we would require expensive genetic testing.
There has been an abundance of publications on COVID-19; however, data on ethnicity and COVID-19 remain limited. Observations from the UK and the USA highlighted increased disease severity and mortality among Blacks, Asians, and Minority Ethnic groups (BAME) Lab, 2020). Some authors analyzed peer-reviewed literature to study the effect of ethnicity on COVID-19 outcomes and found no differences . The same scientific group inspected preprint articles, some of which suggested poorer outcomes in BAME compared to White patients. These publications compared Caucasians to non-Caucasians, mostly Asians, Blacks, and Hispanics.
No studies ascertained specifically the morbidity and mortality of Middle Eastern patients during the COVID-19 pandemic. We believe our findings are of particular interest as this ethnic group displays a higher risk factor profile, yet fewer patients had a critical disease course compared to other groups. We also identified differences between South Asian and East Asian ethnic groups. These two distinct ethnic groups are often considered as one common group in most publications.
Literature on systemic hypertension and cardiovascular disease reflected considerable variation in disease manifestation, outcome, and response to different pharmacological agents among different ethnic groups. A relevant example is that ACE-I and ARB medications were less effective in reducing blood pressure in patients of African ethnicity. In fact, these patients had worse cardiovascular outcomes when started on ARB monotherapy (Brewster et al., 2016).
Angiotensin-converting enzyme 2 (ACE2) receptor is thought to play a critical role in the pathogenesis of COVID-19 as SARS-CoV-2 uses the ACE2 receptor for cell entry (Hoffmann et al., 2020). The virus also uses transmembrane serine protease (TMPRSS2) and Furin peptidase to invade human cells (Al-Mulla et al., 2021). Different studies finished up with inconsistent findings on the genetic predisposal and protection against SARS-CoV-2 among ethnic group in multinational countries. For example, in the USA, African Americans at a higher risk of COVID-19 compared to other ethnicities that live in the country (Whites, Asians, American Indians, Alaskan Natives, and other minorities). This might be explained by the increased gene expression of ACE2 and TMPRSS2 genes in this ethnic group. Additionally, African Americans with asthma are at a greater risk of suffering from the severe form of COVID-19 (Peters et al., 2020). African Americans and Whites have a lower ACE2 expression cell ratio than Asians (Cao et al., 2020). The distribution of ACE1 and ACE2 genotype rates matches CFR in various countries (Gupta and Misra, 2020). The highest CFR (9.6%) is in Europe, followed by North America (5.9%) and Asia (3.5%) (Dongarwar and Salihu, 2020). In a multi-ethnic society, the highest CFR is registered in Blacks (Goldstein and Atherwood, 2020). This suggests gene-environment interactions and ethnic disparities in immune response to COVID-19 (Goldstein and Atherwood, 2020;Nepomuceno et al., 2020).  Factors that influence ACE2 receptor expression, such as the use of ACE-I and ARB, are supposed to affect disease course and severity (Madjid et al., 2020). Gene variation can explain variance of susceptibility to SARS-CoV-2 among ethnicities (Cao et al., 2020).
A thought-provoking question that presents itself is whether patients from the Middle East have an ACE2 receptor morphology that is protective against developing a more severe COVID-19 disease. Researchers investigated whole-exome sequences of individuals from Middle Eastern populations to explore natural variations in the ACE2. They identified two activating variants in the ACE2 gene: K26R and N720D. The variants are more common in Europeans and rare in the Middle Eastern, East Asian, and African populations. The variants change ACE2 gene expression and make people more vulnerable to SARS-CoV-2 infection. Previous studies suggest that K26R can activate ACE2 and facilitate binding to the receptor binding domain while N720D enhances TMPRSS2 protease cutting (Al-Mulla et al., 2021). K26R variant occurs in European people with a frequency of about 0.5%, which predisposes them to more severe SARS-CoV-2 disease. Another single-nucleotide polymorphism of ACE2 that may genetically protect from SARS-CoV-2 disease is more common in African people with a frequency of about 0.3% (Calcagnile et al., 2020). In contrast, deleterious variants that suggest a possible decrease in Furin protease function are detected more frequently among Middle Easterns than Europeans (Al-Mulla et al., 2021).

Socioeconomic Factors
Disparities in SES can also account for ethnic and race disparities in COVID-19. Previous research highlighted a strong association between SES and disease outcomes. The ethnic groups with the lower SES are at risk of contracting COVID-19 (Garg, 2020;Kopel et al., 2020). It remains unclear whether this can be explained by a host genetic interaction (e.g., higher prevalence of underlying chronic disease) or non-genetic behavioral factors such as higher-density living, the use of public transportation, and possibly lower health literacy (Singu et al., 2020). Data related to SES in the UAE (e.g., the level of education and the monthly income) are not routinely collected in hospital medical records so it remains indistinguishable whether SES affected disease severity in our study or not. However, this seems unlikely as the South Asian patients (who represent 43% of our cohort) were younger and had no considerable comorbidities, but had a similar disease outcome to other ethnic groups.
The UAE is a high-income country that has a high rate of young people and a disproportion between men and women due to the recruitment of male workforce (Paulo et al., 2017;Paulo et al., 2018). Such distribution of males and females can explain the prevalence of men admitted to the Mediclinic Parkview hospital, which was used as a research center for our study. Apart from gender and age disparities, the UAE has an uneven distribution of residing nationalities. Emirati citizens make up 11.48% of the population whereas most residents come from India (27.49%) and Pakistan (12.69%), and Egyptians constitute the largest diaspora among non-Emirati Arabs (4.23%) (UAE Population Statistics, 2021).
Although health insurance is mandatory, there is a wide range of insurance providers, and continuous care of expatriates is not well maintained (Paulo et al., 2017;Paulo et al., 2018). With a new place of affiliation, an employee gets a new insurance plan (How to Get Health Insurance in the UAE? News, 2021) which depends on a job role and official monthly income associated with it. The lower the job grade is, the narrower is the insurance coverage. To improve the situation, some companies unified insurance plan for all their employees.

Hematological Abnormalities
COVID-19 can manifest with a profound inflammatory response, which may cause severe immune damage to the lungs. Coronaviruses are able to infect bone marrow cells, which can result in abnormal hematopoiesis (Desai et al., 2021). That is why SARS-CoV2 infection can cause several hematological abnormalities (Mank et al., 2021). The most common abnormalities in COVID-19 include neutrophilia, lymphopenia, and thrombocytopenia. WBC count can be normal or decreased upon admission, and it increases with disease progression. Also, an elevation in the WBC count can be caused by co-infections or medications (e.g., prednisone) (Khartabil et al., 2020).
Lymphopenia leads to the dysfunction of immune system in severe COVID-19 and makes the patients vulnerable to bacterial infections Sun et al., 2020). Pronounced lymphopenia and thrombocytopenia carry poor prognosis especially if accompanied by the elevated D-dimer level (Desai et al., 2021). Both neutrophilia and neutropenia are predictive of poor outcomes and severe respiratory failure in this category of patients (Loṕez-Pereira et al., 2020). However, neutropenia is less common in COVID-19. There were only a few reports of the decreased neutrophil count in these patients (Ai et al., 2020;Ahnach et al., 2020;Yarali et al., 2020). The exact reason for neutropenia in the disease remains unknown. The suggested mechanisms of neutropenia development include bone marrow suppression and accelerated peripheral destruction of neutrophils. These mechanisms have been well described in other viral infections including HIV, cytomegalovirus, Epstein-Barr virus, viral hepatitis, and influenza (Munshi and Montgomery, 2000). Both the moderate (<1,000 cells/ml) and especially the severe neutropenia, which is also called agranulocytosis (<500 cells/ml), are conditions with an extraordinary risk of infections. The conditions require patient monitoring and empirical antibiotic therapy along with the administration of granulocyte colony-stimulating factor in some cases (Devi et al., 2021).
The neutrophil-to-lymphocyte and lymphocyte-to-C-reactive protein ratios are well-established inflammation markers that reflect systemic inflammatory response (Lagunas-Rangel, 2020). NLR is a widely used biomarker for assessing the severity of bacterial infections (Naess et al., 2017;Sun et al., 2020). The increase in neutrophil count indicates the disease aggravation. The decrease in lymphocyte count denotes impairment in immune functioning (Celikbilek et al., 2013;Huang et al., 2019). NLR is shown to be an independent risk factor of severe COVID-19 (Borges et al., 2020). The ratio increases dramatically in patients with the severe disease form (Lagunas-Rangel, 2020). The lymphocytopenia and the increase in the NLR are the most obvious hematological abnormalities associated with the disease.
The low LCR along with the high NLR suggest a poor prognosis in COVID-19 patients (Lagunas-Rangel, 2020). The LCR can capture the early part of the inflammatory cascade more sensitively than the NLR as the CRP levels have been shown to rise earlier than either neutrophilia or lymphopenia is seen in the course of disease. The low LCR and the high NLR observed at different time frames can be regarded as independent predictive markers for in-hospital complications and mortality in COVID-19 patients .
In our study, the minimal neutrophil count and the maximal percentage of cases with neutropenia (<1.0×10 9 /L) were observed in the group of Middle Eastern patients. Among 10 patients with neutropenia, 2 presented with the severe disease and died, 7 patients had comorbidities, and 3 of them developed complications. The NLR was also minimal in the Middle Eastern group. A rise in NLR across the disease as well as high initial levels of the NLR are the markers of poor disease outcomes and high mortality. This finding is aligned with the fact that the Middle Eastern group had the lowest number of patients who required intensive care and developed the critical disease.
In the group of East Asians, we observed the minimal values of LCR and the tendency toward the highest NLR. This correlates with the high proportion of patients with the moderate and severe disease and the maximal number of patients who developed liver dysfunction in this ethnic group.
Both LCR and NLR should be interpreted in conjunction with the clinical data to identify patients at risk of poor prognosis of COVID-19. Neutropenic conditions should be followed up to prevent concomitant infections worsening the disease severity.

The Top-Ranked Features of the Model for Classification Concerning Ethnicity
The top-ranked features listed in Table 5 may represent the ethnic-specific response to the disease. Notably, the count of platelets was the top ranked variable in the model that reflects ethnic-specific features of COVID-19. Because of the disturbed coagulation in COVID-19 patients, there are considerations for the potential role of platelet function and/or platelet activation in the disease severity (Larsen et al., 2020). Furthermore, APTT is also a valuable feature of the classification model (the 7th one out of a total number of 38). The WBC count and the level of lymphocytes on admission are also among the top-ranked attributes. The facts support the hypothesis that some mechanisms of the immune response to COVID-19 are specific to the ethnicity of the patient. Meanwhile, lymphopenia is known to be an essential clinical feature in patients with severe SARS-CoV-2 infection (Zheng et al., 2020).
The activity of LDH enzyme ranks 5th among the most valuable predictors. The biochemical constants (e.g., total bilirubin and creatinine concentration) may account for genetic-based differences in the enzyme regulations and metabolism. The presented symptoms, SOFA, and GCS scores are at the bottom of the list of the valuable features; i.e., the clinical appearance of COVID-19 is not specific to the patient's ethnicity.

The Classification Model and Its Performance
To check the quality of the outcome of the supervised ML model, we employed several algorithms and compared their performance. The NN outperformed all the other methods. We tuned parameters of the model in terms of the number of hidden layers and neurons, optimizer, and hyperparameters and built the three-layer fully connected NN. It showed up to 90% averaged accuracy in the classification by the ethnic group.
The high accuracy of the model supports our hypothesis of the occurrence of ethnic-specific features and patterns in the dataset. As seen from the error matrix (Table 6) and performance matrix (Table 7), the best performance is shown for the most numerous class of South Asians. The highest rate of false-positive values was obtained for the Middle Eastern class, which comes second in terms of the number of patients. The misclassification can be explained by some similarities between the two classes rather than overfitting of the ML algorithm.
To assess the performance of the model, we built the ROC curves for each class separately and calculated the appropriate AUCs for micro and macro average. Figure 5 clearly indicates the high performance of the model with regard to an ethnic group. Micro averaged curve and its AUC indicate high performance for each group as it is calculated globally.

CONCLUSION
• In our cohort, Caucasian or East-Asian COVID-19 patients tended to have a more severe disease despite a lower risk profile. In contrast to this, Middle Eastern COVID-19 patients have a higher risk factor profile but they did not differ markedly in disease severity from the other ethnic groups. • The accurate ethnicity classification model, which is based on the laboratory, physical, and clinical findings, reveals the presence of ethnic-specific features of COVID-19. • The high performance of the ML NN method applied to the classification by the ethnic group from the laboratory and clinical findings supports the occurrence of features and patterns that are specific to ethnicity. This may impact the development of medical treatment and protocols based on ethnic background. • Larger studies are needed to explore the role of ethnicity in COVID-19 disease features.

LIMITATIONS
One of the major strengths of the study was the recruitment of a cohort reflective of all adult age groups. This enabled us to calculate actual risk estimates. The second positive is that all the patients diagnosed with COVID-19 were hospitalized regardless of their disease severity. The diagnostics was performed in full accordance with the common "National Guidelines for Clinical Management and Treatment of COVID-19" (National Emergency Crisis and Disasters Management Authority, 2020), which provided us with the unique study cohort representative of the adult population.
The current study has several limitations. First, it is a singlecenter study in the Emirate of Dubai, which is the most populated city in the UAE with the highest percentage of expatriates (91%), and it does not cover other cities such as Ras-Al-Khaimah where expatriates make up 69% of the population. Thus, UAE nationals might be underrepresented in this cohort.
Second, we were unable to assess the possible impact of socioeconomic factors. The relationship between ethnic background and socioeconomic status with health outcome is complex and multidimensional. Data on socioeconomic status in the UAE (e.g., the level of education and the monthly income) are not routinely collected in hospital medical records. Although we consider its influence on the health state of people with different ethnicity, it is impossible to estimate the effect of the aspect within the society of Dubai. The information on personal income does not reflect the spectrum of expenditures by an individual. Thus, the above-mentioned factors should be the focus of a separate study on economics and public health. Information on the socioeconomic status is missing in the dataset analyzed. The health insurance plan is not a valid marker of socioeconomic status in the UAE (see the Discussion section) and the UAE government provided free medical care to all COVID patients during the study period.
Third, there is no reliable and affordable tool for segregating examinees into ethnic groups and subethnicities. Apart from geographic and cultural similarities, ethnicities have common genes that were not analyzed in the current study. Since the data on the patients' Y haplogroups were not available, we divided the study cohort by geographic location. Large ethnic groups were used in this study. This allowed us to build accurate classification models that justified an association between the disease course and ethnicity. However, we were unable to analyze statistics on COVID-19 in distinct nationalities as the correspondent subgroups were low in numbers and unbalanced.
Fourth, although scientists pay much attention to the association of genetic (e.g., ACE2) factors with the COVID-19 severity and outcomes, the settings of our study did not allow us to focus on this aspect. Genetic tests are quite expensive procedures and are not covered by health insurance. During the first wave of the COVID-19 outbreak, genetic factors were not the focus of research activities. As the pandemic evolves, the analysis of such factors may be helpful for the healthcare sector in multinational countries including the UAE.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request at the site of Big Data Analytics Center (https://bi-dac.com/ covid19-dubai-dataset/).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Mediclinic Middle East Research and Ethics Committee (MCME REC) (reference number MCME.CR. 104.MPAR.2020), Dubai Scientific Research Ethics Committee (DSREC), and Dubai Health Authority (protocol number DSREC-05/2020_25). Written informed consent for participation was not required for this study using secondary deidentified data in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
ML, NS, RS, DK, and SN collected the dataset. FA and YS wrote the manuscript. TH performed the statistical analysis, prepared the figures and tables for data presentation and illustration. TT analyzed the hematological findings and contributed to writing Results and Discussion sections. NZ, TL, and DS contributed to the literature review and data analysis. All authors contributed to the article and approved the submitted version.