AUTHOR=Toliyat Amir , Levitan Sarah Ita , Peng Zheng , Etemadpour Ronak 

TITLE=Asian hate speech detection on Twitter during COVID-19

JOURNAL=Frontiers in Artificial Intelligence

VOLUME=Volume 5 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2022.932381

DOI=10.3389/frai.2022.932381

ISSN=2624-8212

ABSTRACT=COVID-19 started in Wuhan, China, in late 2019, and after being utterly
contagious in Asian countries, it rapidly spread to other countries. This disease caused
governments worldwide to declare public health crisis with drastic measures taken to
contain the spread of the disease. This pandemic affected the lives of millions of people.
Many citizens that have lost their lives and jobs are going through a wide range of
emotions, such as disbelief, shock, concerns about health, fear about food supplies,
anxiety, panic, etc. All of the aforementioned new incidents and phenomena led to the
spread of racism and hate against Asians in western countries, especially in the United
States. The statistics show that Anti-Asian hate crime in 16 of America’s largest cities
increased by 149% in 2020 [1]. In this study, first, we chose a baseline on Americans’ hate
crimes against Asians on Twitter. Then we presented an approach to balance the bias
dataset and consequently improved the performance of tweets’ classification.% We also
have downloaded 10 million tweets through Twitter API V-2 that in this study, we have
used a small portion of that, and we will use the entire dataset in the experiment of our
future work. In this paper, 3000 thousand tweets (downloaded using Twitter API V-2)
are annotated by three Asian and one Asian-American annotator.
We have used different machine learning methods and deep
learning methods in predicting models. Our machine learning methods include Random Forest [2, 3],
K-nearest neighbors(KNN) [4], Support Vector Machine (SVM) [5, 6], Extreme Gradient
Boosting (XGBoost) [7], Logistic Regression [8], Decision Tree [9], Naive Bayes [8]. Our
Deep Learning models include Basic Long Term, Short Term Memory (LSTM) [10],
Bidirectional LSTM [11], Bidirectional LSTM with Drop out [12], Convolution [13]
and Bidirectional Encoder Representations from Transformers (BERT) [14]. We also
tuned our dataset by modifying the agreement between annotators and the Fleiss Kappa
number. Our final result showed that Logistic Regression achieved better performance
in Machine learning with an accuracy of 0.80 and Bert in Deep Learning Categories with
an F1-Score of 0.85