Deep Learning-Based Text Emotion Analysis for Legal Anomie

Text emotion analysis is an effective way for analyzing the emotion of the subjects’ anomie behaviors. This paper proposes a text emotion analysis framework (called BCDF) based on word embedding and splicing. Bi-direction Convolutional Word Embedding Classification Framework (BCDF) can express the word vector in the text and embed the part of speech tagging information as a feature of sentence representation. In addition, an emotional parallel learning mechanism is proposed, which uses the temporal information of the parallel structure calculated by Bi-LSTM to update the storage information through the gating mechanism. The convolutional layer can better extract certain components of sentences (such as adjectives, adverbs, nouns, etc.), which play a more significant role in the expression of emotion. To take advantage of convolution, a Convolutional Long Short-Term Memory (ConvLSTM) network is designed to further improve the classification results. Experimental results show that compared with traditional LSTM model, the proposed text emotion analysis model has increased 3.3 and 10.9% F1 score on psychological and news text datasets, respectively. The proposed CBDM model based on Bi-LSTM and ConvLSTM has great value in practical applications of anomie behavior analysis.


INTRODUCTION
Anomie behavior refers to a disordered social phenomenon caused by the anomie state of the current law in the process of the transformation of a country's new and old systems (Liu et al., 2021). Anomie is a phenomenon caused by the disintegration or transformation of social structure. The disintegration of the social structure. Anomie and deviant behavior increased. Under the market economy, the interweaving of various contradictions, the friction of various phenomena and the confrontation of various behaviors have led to more and more "Anomies" and aroused people's general concern. It has become the main topic of current behavioral law research to think about anomie behavior and seek countermeasures (Fukuda et al., 2021). Teenagers are prone to behavior deviation due to the influence of various factors such as psychology, physiology and living environment (Fukuda et al., 2021). If they are not dredged in time, they will further lead to more serious anomie behavior. Social work adheres to the concept of helping others and self-help and should actively intervene in teenagers' anomie behavior. In recent years, cases of juvenile anomie have occurred frequently, and its social harm is also serious. Therefore, it is the responsibility of adolescent social workers to help anomie adolescents return to the normal life and solve the potential safety hazards in society (Angioletti et al., 2022).
As a computerized text analysis method, deep learning model is currently applied by researchers in many studies in the field of psychological text analysis (Phan and Rauthmann, 2021). In the research field of psychological counseling, the deep learning model can be used to explore the topic of conversation between Counselors and parties in the counseling process, compare the similarity of different treatment categories, and code behavior. In social media and mental health, the deep learning-based classification model can be used to identify and predict various psychological disorders and calculate personality. We will pay attention to the improvement of the deep learning algorithms and apply it to explore the psychological connotation in journalistic text.
Text emotion analysis is a common application of the natural language processing (NLP) methods (Ozawa, 2021). It is to identify the emotional polarity (positive or negative or neutral) or emotional intensity of a given text or sentence segment. Emotional analysis is mostly used in product review analysis and public opinion statistical monitoring, which is helpful to business decision support and public opinion guidance of organs and units. Previous studies mostly focused on the construction of artificial dictionary and feature extraction. However, the construction of emotional dictionary is timeconsuming, laborious, and difficult to maintain. Artificial feature extraction needs expert domain knowledge (Sun et al., 2022). Word vector technology, which has sprung up in recent years, has become the basic technology in natural language processing. However, the popular word vector model is mainly obtained by learning context information. That is, it focuses on semantic information rather than emotional information concerned by emotion analysis tasks. As an improved model of RNN network (Wang et al., 2012), longterm and short-term memory model (LSTM; Wang et al., 2012) can make better use of the long-distance dependence information in sequence data, which is suitable for text emotion classification.
Convolutional neural network (CNN; Tan et al., 2021) is an important feature extraction model for psychological text. Because of its strong local feature extraction ability, it has achieved good results in the field of text classification. However, in the face of the huge amount of data and more categories in some text classification tasks, the traditional CNN model exposes the shortcomings of low calculation efficiency, slow training speed and easy over fitting, which then affects the effect of classification . Therefore, how to optimize the CNN model structure or improve the model algorithm to effectively solve the problem of large-scale text classification is the focus of deep learning classification model research (Agga et al., 2021). When dealing with short text emotion classification, CNN uses convolution layer to extract local features and maximum pooling layer to select the maximum value of local features, which is easy to ignore the long-term sequence characteristics of the texts. This paper uses a new deep learning model Convolutional Long Short-Term Memory (ConvLSTM; Wang et al., 2021), which uses the long LSTM to replace the maximum pool layer in CNN, to reduce the loss of local information and capture the long-term dependence in sentence sequences.
Previous studies have proved that the fusion of different text features can provide more information for the classifier. However, how to construct the word embedding method and the structure of the classifier and integrate different features to make the proposed model have better classification effect is a focus of this paper. In other words, when judging the emotional polarity of sentences, we should not only consider combining more text word information, but also consider constructing classifier structure to better extract different features in the text.
In this paper, we propose a word embedding, and splicing mechanism based on Bi-LSTM  and ConvLSTM, called Bi-direction Convolutional Word Embedding Classification Framework (BCDF). It can not only express the word vector in the text, but also embed the part of speech tagging information as a feature of sentence representation. In addition, an emotional parallel attention mechanism is proposed, which uses the temporal information of the parallel structure calculated by Bi-LSTM to update the storage unit through the gating mechanism. The convolutional layer can better extract certain components of sentences (such as adjectives, adverbs, nouns, etc.), which play a more significant role in the expression of emotion. On this basis, a ConvLSTM network is designed to further improve the classification results. It can be understood that in a sentence, some words are the key words of the sentence. Here, the ConvLSTM mechanism is to help extract this kind of focus and better show the focus in the feature. Experimental results show that compared with benchmark models, this method has better classification performance in terms of the average accuracy and F1 score. The main contribution of this work is as follows: (1) Proposes a new word embedding and splicing mechanism BCDF based on Bi-LSTM and ConvLSTM. BCDF can express the word vector in the text, and embed the part of speech tagging information as a feature of sentence representation.
(2) Uses an emotional parallel attention mechanism to calculate the temporal information of the parallel structure to update the storage unit through the gating mechanism. (3) To take advantage of the strong ability of convolutional layer, this work designs a ConvLSTM network to capture the features of word vectors and to further improve the classification results. (4) Conducts comprehensive experiments to evaluate the performance of BCDF. The evaluation results indicate that BCDF has the highest F1 for text analysis compared with traditional methods.
The structure of this paper is as follows: Section 2 introduces the related work of news and psychological text analysis and their impact on anomie behaviors. Section 3 introduces the proposed BCDF model for word embedding and emotion classification.
Section 4 presents the experiment settings and results. Section 5 concludes this paper.

RELATED WORK
We analyze the related work from two aspects: the application of machine learning-based text analysis in legal anomie analysis area, and the development of the intelligent text emotion analysis in psychological health analysis area.
The Application of Machine Learning-Based Text Analysis Hamilton and Davison (2022) discussed some of the legal and ethical issues that come with machine learning in the text analysis context, as well as some suggestions for managers to use in determining the suitability of machine learning projects. Lai and Tan (2019) utilized deception detection as a testbed to see how we might leverage machine learning models' explanations and predictions to improve human performance while keeping human agency and show that there is a trade-off between human performance and human agency, and that explanations of machine forecasts can help to mitigate this trade-off. Tung (2019) brought the legal function up to date with today's omnipresent transformations It also serves as a reminder to business leaders that the legal function, such as corporate legal strategists, will be required to drive and maintain change at the convergence of law, business, and technology. Verma et al. The Development of the Intelligent Text Emotion Analysis in Psychological Health Analysis Area Tate et al. (2020) developed a model that can predict mental health problems in mid-adolescence and investigate if machine learning techniques will outperform logistic regression. It may be unnecessary for similar studies to forgo logistic regression in favor of other more complex methods. Ji et al. (2022) analyzed the motivation to avoid unbearable psychological pain, coupled with the decision-making bias of underestimating the value of life, is a high predictor of suicide attempt. The analysis also showed that there were differences in potential mechanisms between suicidal ideation and attempted suicide because suicidal ideation was more related to despair. Mohr et al. (2017) critically reviewed the research on personal perception related to mental health, mainly focusing on smart phones, but also including the research on wearable devices, social media, and computers.

Bi-LSTM-Based Word Embedding and Representation
In general, the true meaning of metaphors depends not only on their original meaning, but also on the frontward and backward words. To synthesize the word meaning of the expression words, the word vector of the static sense and text sense is used as the original representation of the input order. The Bi-directional Long Short-Term Memory (Bi-LSTM) model is used to learn the text representation of the journalistic and psychological words. Given an input sentence with N words T = {a 1 , a 2 , . . . , a N }. Combining the two types of words to represent the original meaning of words m i , as shown in Eq. 1.
where f i is a static word embedding. Based on the words m i (i = 1,2,. . .,N), we use Bi-LSTM sequential encoder to generate cultural word representation. Physically, LSTM sheet consists of a gate structure, in which the input gate controls the information that is input into the nerve sheet. The legacy gate determines which information is discarded from the nerve sheet. The output gate determines which information is output from the nerve sheet. In addition, the state value records all the useful historical information of the current moment. By using the forward LSTM and the text before the current word in the sentence, the word representation of x i is calculated by the Eqs. 2-7.
where, x i , o i , represent the input and the output gates, respectively, d i is the text memory content,d i is the new memory content, and h i is the hidden output of the forward LSTM. W z , W o , W f and W c are the weight parameters of the current input a i , G z , G o , G f and G c are the weights of the hidden layer state h i−1 , and v o , v z , v f and v c are the values of the output gate, input gate, legacy gate and hidden single layer state, respectively. Tanh and σ, respectively, represent the tangent and sigmoid functions.
Using the directional LSTM, the text representation is calculated based on the contextual words of a i in each sentence, as shown in Eq. 8.
where ← θ i is all the parameters of the backward LSTM unit. Equation 9 summarizes the calculation steps of the forward LSTM: where θ i is the parameters of the forward LSTM.

Convolutional Long Short-Term Memory-Based Word Embedding Classification Bi-Direction Convolutional Word Embedding Classification Framework
The proposed classification framework BCDF consists of two stacked ConvLSTM layers, one Flatten layer, one Dropout layer, and one Dense layer. The role of the Dropout layer is to prevent overfitting, and the dropout rate is 0.3. In the last Dense layer, the activation function used is Softmax. The input of BCDF is a word vector containing the contextual information of the middle word, namely ContWord. A ContWord is denoted as B j = {b 1 j , b 2 j , · · · , b L j } (L = 400), where j refers to the jth ContWord, and L refers to the number of sampling points contained in a ContWord.
The essence of ConvLSTM is the same as LSTM, which uses the output of the previous layer as the input of the next layer. ConvLSTM adds convolution operations, which is different from the classical LSTM. Therefore, ConvLSTM can obtain the temporal relationship and extract spatial features as a convolutional layer. In ConvLSTM, the switching between states is also replaced by convolution calculations. Equations 10-14 show the state transitions of the ConvLSTM.
where, H, W, * , and • represents a hidden state, a filter, the convolution operator, and the Hadamard product, respectively. I t , χ t , F t , C t , o t and b denotes an input door, the input, a forgotten door, a cell state, an output door, and a bias, respectively. In BCDF, the number of cells in each layer is shown in Table 1. The training parameter settings of BCDF is shown in Table 2.

EXPERIMENTAL DESIGN AND RESULTS ANALYSIS
The experimental device is a computer equipped with an NVIDIA GeForce GTX 950M and a GPU with 3049 MB of memory. Two datasets are used to evaluate the proposed BCDF model: the emotional analysis dataset published on Audio/Visual Emotion Challenge and Workshop'19 (AVEC'19; Ringeval et al., 2019) and Google's GoEmotions dataset (Demszky et al., 2020).   The experiment uses four evaluation metrics to evaluate the classification performance of BCDF. The four metrics are f1 score (F1), recall rate (Rec), precision rate (Pre), and overall accuracy rate (Acc). Formula (6) where, tp is true positives, fn is false negatives, tn is true negatives, and fp is false positives.

Emotional Word Recognition of Different Components of a Sentence
To verify the effectiveness of the proposed BCDF model, it is compared with the state-of-the-art methods based on the two data sets AVEC and GoEmotions. Table 3 shows the results of the emotional word recognition in different components of a sentence. From Table 3, we can see that RNN performs worse than other models. CNN + RNN is better than RNN alone. The reason may be the CNN + RNN model can extract both the contextual and temporal features from a long sentence. LSTM performs similar as the CNN + RNN model, while CNN + LSTM performs better than LSTM. The proposed BCDF achieves the optimal performance by comparing with the four traditional models. The F1's of BCDF on Data1 and Data2 are 75.6 and 69%, respectively, which are 1 and 2% higher than those of CNN + LSTM based on Data1 and Data2, respectively. The proposed BCDF model is better than the compared traditional methods in all psychological texts. Compared with classical methods, the BCDF method is easier to identify the journalistic words in news.

Psychological Word Recognition Evaluation
To evaluate the proposed BCDF model, a psychological word recognition experiment is conducted based on GoEmotions datasets.
We can see from Table 4 that BCDF performs the best on both GoEmotions datasets for recognizing the psychological words. The F1's of BCDF achieves 81.6 and 88.3%, respectively.

Compare Classification Performance of Bi-Direction Convolutional Word Embedding Classification Framework and Long-Term and Short-Term Memory Model
To demonstrate the superiority of the ConvLSTM layer in BCDF, the experiment first compares the BCDF with the LSTMbased model based on the GoEmotions dataset. Figure 1 shows the performance comparison between BCDF and LSTM-based model. As can be seen from Figure 1, the ACC of BCDF is 0.1% higher than that of LSTM-based model. The F1 scores of DCF is 0.8 and 0.5% higher than that of LSTM-based model, respectively.

CONCLUSION
This paper proposed a text emotion analysis framework BCDF based on word embedding and splicing. It can express the word vector in the text and embed the part of speech tagging information as a feature of sentence representation. A Bi-LSTM emotional parallel attention mechanism is also presented. As the convolutional layer can better extract certain components of sentences (such as adjectives, adverbs, nouns, etc.), a ConvLSTM network is designed to further improve the classification results. Experimental results showed that the proposed text emotion analysis model has increased 3.3 and 10.9% in terms of F1 score on psychological and news text datasets, respectively. In the future, the time and space efficiency of the proposed BCDF will be improved by using the advanced lightweight convolutional techniques.

DATA AVAILABILITY STATEMENT
The original contributions presented in this study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.