Multiclass classification using quantum convolutional neural networks with hybrid quantum-classical learning

Multiclass classification is of great interest for various applications, for example, it is a common task in computer vision, where one needs to categorize an image into three or more classes. Here we propose a quantum machine learning approach based on quantum convolutional neural networks for solving the multiclass classification problem. The corresponding learning procedure is implemented via TensorFlowQuantum as a hybrid quantum-classical (variational) model, where quantum output results are fed to the softmax activation function with the subsequent minimization of the cross entropy loss via optimizing the parameters of the quantum circuit. Our conceptional improvements here include a new model for a quantum perceptron and an optimized structure of the quantum circuit. We use the proposed approach to solve a 4-class classification problem for the case of the MNIST dataset using eight qubits for data encoding and four ancilla qubits; previous results have been obtained for 3-class classification problems. Our results show that accuracies of our solution are similar to classical convolutional neural networks with comparable numbers of trainable parameters. We expect that our finding provide a new step towards the use of quantum neural networks for solving relevant problems in the NISQ era and beyond.


Introduction
Quantum computing is now widely considered as a new paradigm for solving computational problems, which are believed to be intractable for classical computing devices [1][2][3][4][5]. The idea behind quantum computing is to use quantum physics phenomena [2], such as superposition and entanglement. Specifically, in the quantum gate-based model, quantum algorithms are implemented as a sequence of logical operations under the qubits (quantum analogs of classical bits), which compose the corresponded quantum circuits terminating by qubit-selective measurements [3].

OPEN ACCESS EDITED BY
Examples of the problems, for whose quantum speedups are expected to be exponential, are prime factorization [4] and simulating quantum systems [5], for example, modelling complex molecules and chemical reactions [6]. The amount of computing power for such applications is, however, greatly exceeds resources of currently available quantum computing devices. For example, factoring RSA-2048 bit key requires 20 million noisy qubits [7], whereas currently available noisy intermediate-scale quantum (NISQ) devices have about 50-100 qubits [8]. Quantum computing can be also considered in the context of data processing [9] and machine learning applications [10], where the required resources for solving practical problems are expected to be not so high. Still the caveats of quantum machine learning are related to the input/ output problems [11]: Although quantum algorithms can provide sizable speedups for processing data, they do not provide advantages in reading classical input data. The cost of reading the input then may in some cases dominate over the advantage of quantum algorithms. One may note that various approaches have been suggested, specifically, amplitude encoding [12], but the problem of the conversion of classical data into quantum data in the general case remains open [11].
The use of NISQ devices in the context of the quantumclassical (variational) model has emerged as a leading strategy for their use in the NISQ era [13,14]. In such a framework, a classical optimizer is used to train a parameterized quantum circuit [13]. This helps to address constraints of the current NISQ devices, specifically, limited numbers of qubits and noise processes limiting circuit depths. An interesting link between quantumclassical (variational) model and architectures of artificial neural networks opens up prospects for the use of such an approach for machine learning problems [15][16][17][18][19][20][21][22]. The workflow of variational quantum algorithms, where parameters of circuit are iteratively updated (optimized), resembles classical learning procedures [19].
A cornerstone problem of various machine-learning-based approaches is classification, that it why it has been widely considered from the view point of potential speedups using quantum computing. As it has been demonstrated in Refs. [9,23], kernel-based quantum algorithms may provide efficient solutions for the classification problem. Specifically, the quantum version of the support vector machine [9] can be used as an optimized binary classifier with complexity logarithmic in the size of the vectors and the number of training examples. A distant-based quantum binary classification has been proposed in Ref. [24]. Alternative versions of binary quantum classifiers have been considered in Refs. [25][26][27][28][29] (for a review, see also Ref. [30]). A natural next step is to consider the multiclass classification, which has been addressed recently in Ref. [31] with the demonstration of the performance on the IBMQX quantum computing platform. This method uses single-qubit encoding and amplitude encoding with embedding of data, so the obtained results are of quite high accuracy for the 3-class classification task. Very recently, an approach based on quantum convolutional neural network (QCNN) [32] have been used for binary classification, albeit, a way to its extension to the multiclass classification case has been discussed. We also note that some of the proposed quantum machine learning algorithms have been tested in practically relevant settings, for example, analyzing NMR readings [33,34] with the trapped-ion quantum computer, learning for the classification of lung cancer patients [35] and classifying and General structure of the proposed quantum neural network structure consisting of several steps: Preliminary scanning using n-qubit filters, pooling, and regular layers.
Frontiers in Physics frontiersin.org 02 ranking DNA to RNA transcription factors [36] using a quantum annealer, weather forecasting [37] on the basis of the superconducting quantum computer, and many others [38].
In this work, we present a quantum multiclass classifier that is based on the QCNN architecture. The developed approach use a traditional utilization of convolutional neural networks, in which few fully connected layers are placed after several convolutional layers. The corresponding learning procedure is implemented via TensorFlowQuantum [39] as a hybrid quantum-classical (variational) model, where quantum output results are fed to softmax cost function with subsequent minimization of it via optimization of parameters of quantum circuit. Then we discuss the modification of a quantum peceptron, which enables us to obtain highly accurate results using quantum circuits with relatively small number of parameters. The obtained results demonstrate successful solving the classification problem for the 4-classes of MNIST images.
Our paper is organized as follows. In Section 2, we present the general description of the proposed quantum algorithm that is used for multiclass classification. In Section 3, we provide indetail discussion of the layer of the proposed quantum machine learning algorithm. In Section 4, we demonstrate the results of the implementation of the proposed algorithm for multiclass image classification for hand-written digits from MNIST and clothes images from fashion MNIST datasets. We conclude in Section 5.

General scheme
The core concept that we use here is the hybrid (variational or quantum-classical) approach (for a review, see Refs. [13,14]). This approach use parametrized (variational) quantum circuits, where the exact parameters of quantum gates within the circuit can be changed. The general structure of our variational circuit is represented on Figure 1. Below we describe the proposed approach for multiclass classification based on the classicalquantum approach.
At the first step, we realize an amplitude encoding of input data, in our case, MNIST images. In fact, due to the high cost of this step [11], we generate a set of encoding circuits, and store their parameters and structure in a memory, thus, hereby making the quantum version of dataset. We consider MNIST images, which are rescaled from 28 by 28 to 16 by 16 pixels, and, thus, 8 qubits are needed. In terms of the corresponding qubit states, encoded images can be expressed as follows: where k is the index of image and |m〉 is a qubit register of 8 qubits, which encode index m, and N = 255. Coefficients C k m are equal to elements of normalized flatten vectors of images. In general, this approach enables us to pack vector of N doubleprecision numbers into log 2 (N) qubits, and, thus, significantly Frontiers in Physics frontiersin.org 03 reduce the size of processed data. It should be noted however that existing algorithms for amplitude encoding scales exponentialy with N; further study is needed to overcome this problem.
We first employ the amplitude encoding procedure [12], where ancilla qubits are used for one-hot encoding of the class of target images. Preliminary analysis of encoded images is performed with 3 convolutional layers with the sizes of filters, equal to 4, 3, and 2, respectively. Each such layer consists of 2 sublayers that are needed to maintain translational invariance (at least, partially), and all the filters of the same size contain identical trainable parameters as it takes place the case for classical convolutional neural networks (CCNN). We note that for filters with the size of 3 we need a virtual qubit, which is always set to zero; such a trick is needed to fit the filter into 8 qubits in the translationally invariant manner. The convolutional layer with pooling is then placed after preliminary layers; at this step the first reduction of the required qubit number is realized.
As in the classical setup, several fully connected layers are added after convolutional layers (9 layers in our case). The further reduction of qubit numbers is realized after regular layers and subsequent pooling are done (in the same way as it is done after convolutional filters).
The final filter is needed for mixing the information from two parts of divided circuit. In the process of learning the output of final filter would contain the codes of classes: |00〉, |01〉, |10〉, and |11〉. Output cascade contains four Toffoli gates, which activate the corresponding ancilla qubit; at the end of the quantum circuit we have one-hot encoded by ancillas class of image. Measurement results of ancilla qubits are passed to the softmax activation function. The categorial cross-entropy is then used as the cost function. The subsequent calculations of gradients of the cost function with respect to the parameters of gates are done using parameter shift rule New parameters of quantum gates obtained by the gradient descent step. The detailed structure of all layers is described below.

Structures of layers
Here we present the detailed description of the layers that are used in our quantum machine learning algorithm.

Preliminary scanning using n-qubit filters
The structure of 4-qubit filters is presented in Figure 2A.
and R Y (Θ 4 ) rotations are added in order to rotate each of four qubit separately. We propose to use controlled parameterized rotations R Y (Φ) for the entanglement, which is an essential new element in the structure of quantum perceptron. We note that in Ref. [31] authors use the standard controlled X gates for this purpose.
Here, as we demonstrate, the parameterized entanglement scheme provide higher accuracy of image classification due to the more flexible learning algorithm.
In classical machine learning, the linear perceptron is passed through a certain non-linear function, which is essential for the learning process. In the quantum case, instead of summations of neurons we use entanglement of qubits. The degree of entanglement is controlled by parameters Φ, which makes learning process more flexible, and, thus, classification procedure may become more accurate. In fact, many classical activation functions like sigmoid or tanh behave akin to switches, so their values change from 0 or from −1 to 1 in a certain region. In quantum domain, we can switch from separable (nonentangled) to entangled state, what could play the role of non-linearities in classical learning. So far, individual rotations, which are followed by the parameterized entanglement, can be considered as an analog of the perceptron with the non-linearity.
After 4-qubit scanning, smaller-scale filters are applied to analyze obtained quantum feature map in more details. The structure of layers with 3-qubit filters is presented in Figure 2B. In order to rotate 3 indiviadual qubits R Y (Θ 1 ), R Y (Θ 2 ) and R Y (Θ 3 ) gates are added. Similarly to the case of 4qubit filters, individual rotations are performed by parameterized R Y gates. We note that even in the case of 3qubit filters, we use the entanglement of 4 qubits. Even though, the entanglement of 3 qubits looks more intuitive in this case, as we show below, the 4-qubit one provide more accurate results on image recognition. More detailed scanning of images is performed by layer with 2-qubit filters; the corresponding circuit is given in Figure 2C (also see Ref. [32]). As in all previous cases, we use 4-qubit entanglement and the filter consist of 4 individual rotations with additional entanglement by CNOT gates. The idea of using 4-qubit entanglement is inspired by classical CNN, where generation of new feature maps is done by summation of contracted with weights previous feature maps and subsequent application of non-linearity.

Quantum convolutional neural network layer with pooling
After the preliminary scanning step, the obtained quantum state of 8 qubits contains encoding of feature maps. The role of the next layer (see Figure 3) is to analyze these maps in more detail and pick up the most important of them. The scheme of the layer is given in Figure 3B, where the convolutional filter is the same as in Figure 3A. We note that in the pooling circuit, controlled R Z rotation is activated if the first qubit is at state 1, while the controlled R X gate is used when upper qubit at state 0. This is conceptually similar to the structure proposed in Ref. [32].

Regular layers
Similarly to the CCNN case, several regular layers are placed after convolutional layers. In our case we add 8 layers, as shown in Figure 3A. In order to get more accurate results, the double entanglement is added after individual rotations. The second reduction of qubit number in circuit is done by two pooling procedure as in the case of convolution layers. In order to obtain the required structure, we add a final filter at the end of the quantum circuit. As it is shown below, the use of the final filter is essential for obtaining more accurate results of image classifications.

Toffoli and controlled rotation gates
The practical realization of high-fidelity two-qubit operations on quantum hardware is still a challenging task. The situation is typically more difficult with for three-qubit gates, such as Toffoli gate. Thus, it is necessary to decompose these gates via singleand two-qubit gates, which can be practically performed. The general algorithm of n-controlled rotations is presented in Ref. [40] and for the case of single-controlled rotation it can be expressed as it is shown in Figure 4A. In order to implement the Toffoli gate, we consider the qubit inversion as a rotation operation around X or Y axes and in our case doubly-controlled FIGURE 3 In (A) the convolutional layer with pooling is shown. In (B) the structure of regular layers is illustrated. Frontiers in Physics frontiersin.org 05 R Y (Θ) gate is used with the value of Θ = 2π. The circuit is presented in Figure 4B and it corresponds to the representation of sum of parameterized n-controlled rotations, which are considered in Ref. [40]. Toffoli gate, in fact, can be considered as a sum of such rotations with n = 2, where Θ angles of all rotations, except the one that is controlled by 11th combination, are set to zero. The definition of α angles is realized along the lines of the procedure of Ref. [40]; they are obtained from Θ angles by simple matrix transformation.
We note that multiqubit gate decomposition can be further improved using qudits, which are multilevel quantum systems. As it has been shown, the upper levels of qudits can be used instead of ancilla qubits in the decomposition [41-45].

Classification results
We benchmark the proposed quantum machine learning algorithm with the use of hand-written digits from MNIST and clothes images from fashion MNIST datasets. Examples are presented in Figure 5.
All the simulations are performed using Cirq python library for the constructions of quantum circuits; TensorFlowQuantum library [39] is used for the implementation of machine learning algorithm with parameterized quantum circuits. We use the Adam version of gradient descent with learning rate equal to 0.00005, the overall number of trainable parameters in the QCNN circuit is equal to 149. As a metric for the model performance, we simply use the accuracy of the recognition and for more detailed analysis two sets of experiments are done. In all the conducted experiments, parameterized quantum circuits are trained during 50 epochs.
Within the first set training and classification is done for the case, when dataset consists of images, which has certain similarity and, thus, classification problem become more difficult. We use MNIST images of digits 3, 4, 5, and 6 for this part. Also, fashion MNIST images with labels 0, 1, 2, and 3 are used for this purpose.
The second experimental set is focused on images, which are strongly differs from each other, thus, making recognition process easier; MNIST digits 0, 1, 2, and 3; fashion MNIST images with labels 1, 2, 8, and 9 are considered. Total number of considered images of each type is given in Table 1.
Each image vector is normalized to one since only such kind of vectors can be used by amplitude encoding algorithm. Results of image classification are given in Table 2. Quantum circuits for multiclass classification are considered in Ref. [31]. QCNN examples, provided within documentation of TensorFlowQuantum [39] also can be relatively simply generalized for the case of multiclass classification tasks. In the second column of Table 2 we provide results of experiments with circuits, similar to those of Ref. [31]. In order to obtain these results we replace all the R Y (Θ) used at entanglement steps by CNOT gates. Also, we remove all the parts of circuit of Figure 1, which are placed after regular layers, i.e. pooling layers, final layers and the part with Toffoli gates. Entanglement of ancilla qubits with regular layers is done via CNOT gates according to Figure 2 of reg [31]. Third column of Table 2 contain results, obtained with full circuit of Figure 1. Significant improvement of the accuracy of classification results caused by two facts. First one -is the usage of parameterized entanglemet in our circuit. Secondly, increase of the performance may be connected with the fact, that our circuit constructed in a similar way to classical neural networks -we use qubit reduction procedure in analogy with the reduction of number of layers outputs in classical case until the number of outputs become equal to the number of target classes. Note that in Figure 1 ancilla qubits are used only at read-out step and no entanglement is needed between ancillas and other qubits during computational procedure, what can significantly simplify requirements to the corresponding quantum hardware. We also compare obtained quantum results with results of the CCNN with similar number of parameters, which is 188 in our case. The structure of the CCNN is presented in Table 3. Clearly, classical results are more accurate, what indicate on the fact that with similar number of parameters classical model is still more expressive. An analysis of possible quantum advantage in ML tasks is presented in Ref. [46]. In their study authors analyze ML models based on kernel functions and show that with enough data provided classical methods become more powerfull then corresponding quantum algoritms. Thus, additional study is still needed to find ML tasks where quantum algorithms will outperform thier classical analogs.
In overall, the QCNN can produce accuracy of multiclass classification that are qualitatively similar to the classical model if the number of parameters are comparable. We would like to mention that the similar level of the accuracy has been achieved in Ref. [31] for the case of the 3-class classification problem. Here we have demonstrated this level of the accuracy for the 4-class classification tasks, which to the best of our knowledge is the first such demonstration.

Conclusion
Here we have demonstrated the quantum multiclass classifier, which is based on the QCNN architecture. The main conceptual improvements that we have realized are the new model for quantum perceptron and the optimized structure of the quantum circuit. We have shown the use of the proposed approach for 4-class classification for the case of four MNIST. As we have presented, the results obtained with the QCNN are comparable with those of CCNN for the case if the number of parameters are comparable. We expect that further optimizations of the perceptron can be studied in the future in order to make this approach more efficient. Moreover, since the scheme require the use of multiqubit gates, the qudit processors, where multiqubit gate decompositions can be implemented in a more efficient manner, can be of interest for the realization of such algorithms.

Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://deepai.org/dataset/mnist.

Author contributions
DB-formulation of algorithm, AM-software development, AB-software development, DT-project management, AF-project supervision.