The Similarities and Distances of Growth Rates Related to COVID-19 Between Different Countries Based on Spectral Analysis

The COVID-19 pandemic has taken more than 1.78 million of lives across the globe. Identifying the underlying evolutive patterns between different countries would help us single out the mutated paths and behavior of this virus. I devise an orthonormal basis which would serve as the features to relate the evolution of one country's cases and deaths to others another's via coefficients from the inner product. Then I rank the coefficients measured by the inner product via the featured frequencies. The distances between these ranked vectors are evaluated by Manhattan metric. Afterwards, I associate each country with its nearest neighbor which shares the evolutive pattern via the distance matrix. Our research shows such patterns is are not random at all, i.e., the underlying pattern could be contributed to by some factors. In the end, I perform the typical cosine similarity on the time-series data. The comparison shows our mechanism differs from the typical one, but is also related to each it in some way. These findings reveal the underlying interaction between countries with respect to cases and deaths of COVID-19.


INTRODUCTION
COVID-19 is in full, the COVID-19 pandemic is still ongoing and is spreading across all the continents (1,2). The spread of this pandemic is has been studied by many researchers (3)(4)(5)(6). There are many ways to look into the behaviors of the viruses or the pandemic itself (7,8) for the sake of efficacy of travel bans or vaccines (9). Some researches have even have established the relations between cases and deaths of COVID-19 from demographic, economic, and social perspectives (10). In this article, I devise an orthonormal basis (11) B N which is motived by Fourier analysis (12) and could thus take the underlying frequencies of data into consideration. I utilize the COVID-19 database (13) which records the weekly COVID-19 cases and deaths from Week 15 to 51 (37 weeks in total). By filtering out some non-essential data (countries), I obtain 90 countries as our research targets. By calculating the 36 (the number of intervals from Week 15 to 51) growth rates of the cases and deaths for the 90 countries, I have an input vector. By transforming this vector into a set of coefficients, which is the results of inner product via B 36 , I start to rank the coefficients by positive integers,: from 1 to 36. The ranks indicate the strength (relation) between the input vector and the underlying frequencies. A larger coefficient will be assigned a larger positive integer. By doing so, I have a 90 × 36 coefficient matrix, where 90 is the number of the sampled countries and 36 is the number of frequencies (or the length of the input vector). Then, I use Manhattan metric (14) to measure the distances between all the ranked vectors and yield a 90 × 90 distance matrix. Afterwards, I associate each country with its nearest neighbor via the minimal distance in the distance matrix. In the end, I rerun our data with another typical approach: cosine similarity, which could be calculated either from the original time-series data or the transformed frequency coefficients, i.e., both would produce the identical results by the property of an inner product. The interaction between these two approaches are also revealed via Jaccard Index (15). Our research shows that the patterned evolutive correlation between counties not random, i.e., there are some fundamental factors that contribute to such relation. The research also reveals that the correlated patterns for cases and deaths between countries bears no similarity at all. This also indicates that there is a strong discrepancy between evolution of cases and the one of deaths.

METHODOLOGY AND PROCEDURES
I devise a class of orthogonal bases, which are serve as our feature extractors. Then a complete set of procedures are is also described in this section.

Orthogonal Basis
Motivated by the Fourier series and Fourier transform, I devise an orthonormal basis which is easier and much more intuitive to adopt and interpret the analysis of, since it involves only the real numbers-not the complex numbers, which normally are harder to use to interpret the analyzed results.
Let N denote the set of positive integers. Suppose v is a vector whose elements are all non-negative integers. v i is used to denote its i'th element in the vector and | v| is used to denote its length. Let us assume | v| = N + 1, where N stands for a natural number in this article. I use v to denote its growth vector, ). Observe that | v| = N. This growth vector is our main research target, since I study the (weekly) growth rates of cases and deaths regarding COVID-19. Later on, I would tweet tweak the definition of growth vector slightly to fit our analytical purpose. For any two vectors v and w, I use < v, w > to denote their inner product. Define real By some manipulation of mathematical operations, B N is provend to be an orthogonal basis for all natural number N.

Procedures
In this section, I describe a procedure to analyze (in the form of a matrix) M × (N + 1) time-series data, where M is the number of the sets and N + 1, which is the number of points of time.
The purpose for adding 1 is to simplify our further analysis which utilizes its difference (or N intervals). The whole analytical steps go as follows: 1. Specify the M researched subjects (for example, countries) and N + 1 points of times (for example, weeks). Then collect the sets of time-series data which could then be represented by Here (for our analytical purpose) the denominator is deliberately added by 1 to avoid the divisor being 0.
FIGURE 1 | Inner product of case and death growth rates and featured frequencies, which is calculated in Table 1 for Afghanistan (AFG) and Algeria (DZA). (A) Inner product of case growth rates and featured frequencies for Afghanistan (AFG) and Algeria (DZA). (B) Inner product of death growth rates and featured frequencies for Afghanistan (AFG) and Algeria (DZA).

Calculate the distances between all the ranked vectors via
Manhattan metric d among all the M subjects that would result in a distance matrix [d(RB N ( v k ), RB N ( v h ))] M k,h=1 . 6. Find the minimal pairs (or nearest neighbors) for all the subjects with least distance via the above distance matrix.

RESULTS
In correspondence to section 2, I embark on data analysis and produce the results in this section. I download the historical weekly data (up to Week 51, 2020) of the reported COVID-19 cases and deaths worldwide. In order to avoid biased sampling, I filter the data according to the following criteria: 1. Among all the countries, only the populations with of more than 10 millions are included; 2. Only data from Week 15 to 51, Year 2020 are taken as samples.
First of all, the global weekly data regarding COVID-19 are read from its source file (13) and stored in a matrix DT whose size is 9,152 by 10. After filtering out the non-essential samples by the above criteria, I obtain 90 countries (with abbreviated country codes and corresponding labels) as shown in Table 1each of which contains 37 weekly data (from Week 15 to 51). Furthermore, each country is represented by a 37 by 2 matrix, where 2 indicates the two columns chosen (cases weekly and deaths weekly) out of the original ten columns. An example of such matrices for Country AFG and DZA are listed in Table 2.
Data for other countries are omitted here for limited space. In the table, "1c" denotes the cases of COVID-19 in AFG; "2c" denotes the cases of COVID-19 in DZA; "1d" denotes the deaths of COVID-19 in AFG and "2d" denotes the deaths of COVID-19 in DZA. Based on this table, I start to calculate the weekly growth rates for cases and deaths by the formula where Week(n) denotes the growth rates for cases or deaths at Week n. Observe that 1 is added to the denominator to avoid the infinite growth rate. An example of cases and deaths regarding the growth rates for AFG and DZA are presented in Table 3.
Based on this table and the featured frequencies (vectors), i.e., orthonormal basis B N (or B 36 in our case), one could then calculate (an example for AFG and DZA) their coefficients (or inner product) as shown in Table 4, in which the meaning of b j is explained in section 2.1. Now I rank the coefficients. A higher positive integer is assigned, if a coefficient is higher. The assignment for each country (here I present only AFG and DZA) is shown in Table 5. The distances of ranked vectors between different countries could then be calculated by Manhattan metric. The results are shown in Table 6.
Based on these distance matrices, one could associate each country with its nearest neighbor(s) with respect to cases and deaths. The results are presented in Table 7.
In the table, "Cty" stands for Country. Since the death rate for Country 14 is 0, the associated values are ignored when it is involved. Some countries might associate with more than one country.

Comparison
Here I utilize another typical approach, namely: cosine similarity, to compare our method with others. Though the cosine similarity is highly frequently used in many fields, it focuses less on the some internal structures. For example, if p = (5, 4), q = (−4, 5), r = (1, − 5 4 ). Then cos( p, q) = cos( p, r) = 0. But, with our ranked Manhattan metric (or d) d( p, q) = 2 and d( p, r) = 0. Moreover, when the coefficients are ranked, they tend to reduce the noise of the data-in particular, the cases and deaths are affected by many factors. The results of the cosine similarities for the 90 countries (except the for country 14, which is ignored for the part of deaths, due to its death cases are being zero). The results are presented in Table 8. Again, by linking each country to its neighbor which has the maximal cosine similarities, one has Table 9.

Optimal Pairings
In this section, I list and compare the optimal minimal and maximal pairs from Tables 7, 9. The results are shown in Table 10. I could apply Jaccard Index J(A, B) = |A∩B| |A∪B| to analyze their relation, where A, B are sets.
Frontiers in Public Health | www.frontiersin.org    distance matrix. By the end, I compare our mechanism with the usual cosine similarity analysis. The result shows these two approaches yield quite different results -this indicates our approach provides another aspect to look into the evolution of COVID-19. The comparison also reveals some points: first of all, the evolutive pattern for cases and deaths are very differentwhich is concluded from Table 11; secondly, regardless of the cases or the deaths, our method and the typical one are highly related to each other; and thirdly, the relation between the paired countries-no matter which approach one adopts-is not random, since the ratios of pairs formed are very high. This indicates our research provides some insightful structure of the evolution of COVID-19 between countries. However, some of the results about causal relations in this study might not comply with other researches (10). This is reasonable, since the approach I adopt focuses more on feature detection, not solely on causal relation finding. For the future research, one could look into the pairs to identify the fundamental factors that contribute to such correlated patterns between countries. Furthermore, one could also delve into the shift of phrases of the frequencies by lifting the constraint on weekly growth rates. This might yield an even more dynamical pictures of the evolutions.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.