Random and Systematic Variation in Nanoscale Hf0.5Zr0.5O2 Ferroelectric FinFETs: Physical Origin and Neuromorphic Circuit Implications

This work presents 2-bits/cell operation in deeply scaled ferroelectric finFETs (Fe-finFET) with a 1 µs write pulse of maximum ±5 V amplitude and WRITE endurance above 109 cycles. Fe-finFET devices with single and multiple fins have been fabricated on an SOI wafer using a gate first process, with gate lengths down to 70 nm and fin width 20 nm. Extrapolated retention above 10 years also ensures stable inference operation for 10 years without any need for re-training. Statistical modeling of device-to-device and cycle-to-cycle variation is performed based on measured data and applied to neural network simulations using the CIMulator software platform. Stochastic device-to-device variation is mainly compensated during online training and has virtually no impact on training accuracy. On the other hand, stochastic cycle-to-cycle threshold voltage variation up to 400 mV can be tolerated for MNIST handwritten digits recognition. A substantial inference accuracy drop with systematic retention degradation was observed in analog neural networks. However, quaternary neural networks (QNNs) and binary neural networks (BNNs) with Fe-finFETs as synaptic devices demonstrated excellent immunity toward the cumulative impact of stochastic and systematic variations.


INTRODUCTION
The advent of convolutional neural networks (Lecun et al., 2015) has made machine learning or neural network-based computation an inevitable choice for solving many complex tasks in recent times. The real-time processing of the enormous amount of data generated from the internet search engines, social networks, and edge devices in health care systems requires massive computing power in conventional von-Neumann computing architecture. The memory-bandwidth bottleneck in von-Neumann computing and the torrent of data generated on the internet every second have reinvigorated the research in brain-inspired computing. Although modern data centers, furnished by high-performance GPU or TPU, perform the data handling and classification task precisely, the power-hungry nature makes them inappropriate for the end-user devices. Artificial neural networks implemented in such a traditional von-Neumann computing system endure severe bottlenecks during the data transfer between segregated memory and processing units. Quintessentially, there are two pathways for implementing the artificial neural network (ANN).
The first one is the computer science-based approach of ANN (Fukushima, 1988;Riesenhuber and Poggio, 1999), which fails to imitate the proper brain functions in terms of power consumption and speed of operations. Therefore, a significant interest in the area of emerging non-volatile memory based on non-von-Neumann computing (Chen et al., 2015;Diehl et al., 2015;Gokmen and Vlasov, 2016;Bae et al., 2017;Ambrogio et al., 2018;Chang et al., 2018;Kim H. et al., 2018;Ernoult et al., 2019;Fu et al., 2019) has grown. The pivotal motivation of the present-day research on the neural network is building dedicated hardware modules for implementing lowpower, fast-computing units without affecting the recent trend of scaling.
CMOS-compatibility and low process temperature of hafnium zirconium oxide (HZO) makes HZO-based ferroelectric (Fe) finFETs an excellent candidate for logic, memory, and neuromorphic devices. This property can be attributed to its superior endurance and write speed as compared to Flash, significantly higher on-to-off current ratio than MRAM, as well as the negligible impact from random telegraphic noise due to charge-based operation, unlike RRAM. The advantage of HZO over other perovskite ferroelectric materials and Sidoped hafnium oxides (HSO) has been mentioned in previous reports, which involves ease of deposition by the ALD process, scalability to thin film, and lower process temperature (Muller et al., 2012;Jerry et al., 2017;Kim H. et al., 2018;Ali et al., 2019;Ni et al., 2019;Cheema et al., 2020). Recent reports have also shown that low process temperature, superior interface quality, and reducing the numbers of defect sites in HZO improve the endurance of the HZO-based transistors (Dutta et al., 2020;De et al., 2021a;De et al., 2021b;Khakimov et al., 2021). Apart from the device structure, low process temperature and interface properties, the pulse scheme, and bias-technique during the WRITE operation also play an important role in determining the WRITE-endurance limit of the device. However, the pivotal issue lies in the increasing stochasticity with scaling as well as inherent charge-trapping sites that may capture electrons or holes from the channel side (CS) or gate side (GS) (Dunkel et al., 2017;Alam et al., 2019). These effects create serious reliability issues in terms of degradation of endurance (Dunkel et al., 2017) and increased variation during program-erase (WRITE) operation in deeply scaled HZO-based ferroelectric FET (Fe-FET) devices.
Ferroelectricity is a crystal structure-dependent property engendered from the polarization catastrophe (Slater, 1950;Kittel, 1951;Cochran, 1959;Cowley, 1965). The noncentrosymmetric Pca 21 orthorhombic phase is responsible for ferroelectricity in HZO. The other phases, cubic or monoclinic symmetry, do not show ferroelectricity. Therefore, it is essential to form the orthorhombic phase in the HZO film to instigate the ferroelectric switching. Usually, hafnium oxide is doped with silicon, aluminum, or zirconium and subjected to thermal annealing under various conditions to stabilize the metastable ferroelectric orthorhombic phase (Frascaroli et al., 2015;Materlik et al., 2018;Park et al., 2019;Sultana et al., 2019). Despite the adoption of these stabilization techniques, atomic-layerdeposited (ALD) HZO films exhibit non-uniform crystal properties. There have been numerous efforts from the scientific community to find the possible solutions to mitigate these non-idealities for improving the performance of ferroelectric FETs. However, the previous studies on the variability of Fe-FETs have primarily focused on large (>1 µm 2 ) and planar MOSFET devices, in which the random distribution of ferroelectric-dielectric domains and trapping are the two primary sources of variation. Figure 1 provides a qualitative overview of Fe-FET's applicability as an artificial synapse and the scaling trends toward implementing the finFET-based neuromorphic computing platform from planar CMOS technologies.
The feasibility of 28 nm technology node-based Fe-FETs was successfully demonstrated with the fabrication of such synaptic device in GlobalFoundries facilities (Soliman et al., 2020). Figure 2A illustrates the 28 nm technology node-based FeFET with the schematic and transmission electron microscopic (TEM) image. The binary WRITE operation was conducted by 4.5 and −5 V pulse of 500 ns pulse width. A reset pulse of opposite amplitude preceded each WRITE pulse. Figure 2B shows pulse-driven binary switching capability among sixty devices. Gradual increasing pulse-driven (step size 0.1 V) granular switching capability in a single FeFET cell has been shown in Figure 2C. However, this work focuses on ferroelectric finFETs (Fe-finFETs) fabricated on silicon-on-insulator (SOI) wafers at the Taiwan Semiconductor Research Institute's nanofabrication facility. The primary geometrical difference between a planar MOSFET and a finFET is that the former is a planar structure, and the latter is a non-planar structure. The process involved in fin formation involves reactive ion etching (RIE) of the bottom electrode (Si-fin), which is not required in planar MOSFETs. We have observed that during RIE, the line edge roughness of the sidewalls of the Si-fin is increased, and the RIE via H-Br plasma creates a Si-Br bond at the interface (Bestwick and Oehrlein, 1990). The primary issues with the degradation in Fe-finFET's performance are three-fold. First, the random variation of the ferroelectric and dielectric phase in gate-stack was reported by De et al. (2021a) that long duration annealing could mitigate the issues associated with the random distribution of the ferroelectric-dielectric phase in HZO. The second is parasitic charge clouds at the interface and fin line-edge-roughness (LER), which increases the charge-trapping probability and inhibits fast WRITE. Previous reports have shown that surface treatment can alleviate the issues associated with the poor interface quality and reduce the device-to-device variations up to a maximum 10% deviation from the mean value (De et al., 2021a;De et al., 2021b). The final and third one is systematic retention degradation, which directly impacts the inference operation (Baig et al., 2021).
This work focuses on evaluating the impact of systematic retention degradation and random device-to-device and cycle-tocycle variations on the training and inference accuracy of Fe-finFET-based neural networks. In the first part of this article, we describe the fabrication, characterization, and analysis of variability in Fe-finFETs. The second part of this article evaluates the impact of experimentally observed device-todevice and cycle-to-cycle variation on training and inference operation. Neural network simulation is performed using the CIMulator (Le et al., 2020) software platform. Fe-finFETs may also be used for online training, which requires constant updating of weight coefficient (memory write) during training. With endurance up to 10 9 demonstrated for HZO films (Chung et al., 2018;De et al., 2021b), such application is possible. For inference-only applications, Fe-finFETs form a memory array to perform multiply-and-accumulate (MAC) tasks. It is programmed only one time to store the weight coefficient of the neural network and used for MAC operations by read-only (READ) operations without changing weights (memory write), which necessitates the stable data retention capability. Systematic variations, such as retention degradation, are expected to play an essential role in inference-only applications, whereas cycle-tocycle variation, physically originating from random telegraphic noise, is expected to play a crucial role in both inference and training applications. Based on our findings, we conclude that the quaternary neural network (QNN) demonstrates higher immunity than the analog neural network toward systematic and random variation.

MATERIALS AND METHODS
The voltage-dependent partial switching mechanism in HZObased Fe-finFETs (Trentzsch et al., 2016;Jerry et al., 2017) can be exploited to obtain analog synaptic behavior. However, the infidelity in updating the conductance states of all the devices in the memory array in a reliable manner during the weight update process possesses a fundamental challenge in designing an analog neural network using FeFET devices as a synapse. The following section extends the investigation on such variation in Fe-finFET devices and their applicability as synaptic devices in neural networks' training and inference operation.
Hf 0.5 Zr 0.5 O 2 -Based Ferroelectric FinFET Fabrication, Characterization, and Modeling Nanoscale n-type and p-type tri-gate Fe-finFETs were fabricated on an SOI wafer using a gate first self-aligned high-K and metal gate process (De et al., 2021b). The devices under consideration have a gate length (L) of 70 nm and fin width (W) of 20 nm. The fin height (H) for all devices was fixed at 30 nm, and the thickness of the ferroelectric (T Fe ) layer was 10 nm. Figure 3 schematically illustrates the process flow. During the fabrication process, dry oxidation was performed after fin formation by e-beam lithography. This oxide layer was used as a sacrificial layer to reduce fin line-edge-roughness engendered from lithography and dry-etching process. This oxide layer was removed by diluted hydrogen fluoride (DHF) solution, followed by chemical oxidation via hydrogen peroxide (H 2 O 2 ) solution to form a relatively thinner interfacial oxide layer of thickness 0.8-1 nm. This process reduces the interfacial trap densities by removing the impurities from the surface of the silicon (Si) fin. The gate stack consisting of HZO and TiN was formed by the atomic layer deposition (ALD) and physical vapor deposition (PVD) method on top of the interfacial layer. Figure 3 also displays the schematic process illustration and transmission electron micrograph (TEM). The electrical characterization of the devices was carried out by an Agilent B1530A Arbitrary Waveform Generation and Measurement unit. The program (WRITE-1) and erase (WRITE-0) characteristics were analyzed to understand device-to-device and cycle-to-cycle variation in fabricated devices. The devices were subjected to 1-µs-wide pulses with positive or negative amplitudes for characterizing their dynamic behavior, including their response to WRITE operation, multilevel programming behavior, symmetry, and linearity. The characterization of the multi-fin Fe-finFET device was conducted by 1-μs-wide pulses. The low threshold voltage (LVT) state was achieved by applying an +8 V pulse at the gate terminal. On the other hand, a high-voltage state (HVT) was achieved using a pulse of opposite (negative) polarity with similar width and magnitude. To facilitate granular switching and obtain multilevel potentiation and depression characteristics in a single device, we applied a 1-μs-wide pulse of increasing (3-8 V with a step of 0.2 V) and decreasing (−0.8 to −5.4 V with a step of −0.2 V) amplitudes to partially polarize the HZO stack in Fe-FET ( Figure 4A).  Although we have observed a ferroelectric domain switching-assisted threshold voltage (V th ) shift upon the application of WRITE pulse of much lower amplitude, the WRITE scheme described in this work is suitable for wafer-scale integration with higher linearity (Lederer et al., 2021). The drain voltage was kept at 0 V during the WRITE operation. Each set (potentiation or depression) pulse was preceded by a reset pulse (±8 V) to drive the conductance back to the starting point. The incrementing WRITE pulse amplitude in each step increments the remnant polarization of the HZO film by a small amount, which inevitably changes the V th and channel conductance of the Fe-finFETs (Faran and Shilo, 2010;Van Houdt and Roussel, 2018).
The READ of the memory state of the device was accomplished 100 μs after each WRITE pulse. The drain terminal was kept at a constant 100 mV during the READ operation, and a ramp voltage was applied at the gate terminal. The ramp magnitude was varied during the READ operations, similar to Lu et al. (2020), to eradicate the trapping in the HZO film. The conductance was extracted from the I d -V g curve of potentiation and depression operations at a specific gate voltage. We have used various gate voltages of values 0, 0.25, 0.5, 0.75, 1, and 1.25 V to extract the conductance. Figure 4B shows the highly linear and symmetric threshold voltage modulation traits obtained by partial polarization of the ferroelectric stack in a Fe-finFET with a gate length 70 nm. We have applied positive and gradually increasing pulses for LTP and negative pulses for LTD. Each pulse changes the remnant polarization of the HZO stack, which changes the device's threshold voltage, thus resulting in a new memory state. We have obtained a maximum of 50 numbers of memory states and a minimum of 26 numbers of memory states for a single device. This variation in available conducting states can be attributed to charge trapping, channel percolation, and nucleation growth domain dynamics (Xiang et al., 2020;Xiang et al., 2021). Figure 4C shows the gradual change in channel conductance during potentiation and depression. The channel conductance is modeled according to one of our previous publications (Lu et al., 2020): The model is based on the industry-standard BSIM4 model (Dunga et al., 2007). The term "V gsteff " denotes the effective gate overdrive voltage. UA, EU, and Δ account for the vertical field-dependent mobility degradation, whereas the drain-tosource series resistance effects are implemented by the R ds term. The term UD accounts for Coulomb scattering. The details of this model have been described in (Lu et al., 2020), and this model shows only 1.4% r.m.s error from the measured data (Lu et al., 2020). Although the number of threshold voltage states is constant with reading gate voltage (V g read ), the number of available conductance states highly depends on V g read . Primarily, if V g read is too low, most of the operational ranges will fall within the sub-threshold region (below 200 nA) and do not count as distinct states. At the same time, however, the I on -I off ratio will be degraded when V g read is too high. Therefore, the optimal choice of V g read is critical (Lederer et al., 2021).
Although the characterization of a single stand-alone device depicts 26-different conducting states, the device-to-device variation in WRITE operation inhibits 26-level operation in memory arrays. Amid variation, the conductance states start overlapping, and the numbers of available forms start reducing. The root causes for device-to-device variation are the ferroelectric switching process, random distribution of coercive voltage, ferroelectric-dielectric domains, presence of different trapping sites, and surface roughness at the interfaces (De et al., 2021b). The impact of these variations becomes acute, and the number of programming states gets limited to only two when the device is further scaled down. We have obtained bidirectionally programmable 2 bits/cell operation in the fabricated finFET devices with 70 nm gate length and 20 nm fin width. Figure 5A illustrates the pulse scheme used for WRITE operations. Figure 5B demonstrates the 2 bits/cell operation conducted among 12 devices of similar dimensions (L 70 nm; W 20 nm). The probability distribution of the channel current, demonstrating the device-to-device variation is shown in Figure 5C. The READ operation was carried out by non-disturbing DC sweep at the gate, and the extraction of the channel current was carried out at the 1 V value of V g read . Figure 5D demonstrates cycle-to-cycle variation for WRITE operation for four equi-distance current levels. The stability of V g read is of utmost essential to implement 2 bits/cell operation. V g read should not alter the programmed state, and it should be high enough to avoid low-frequency noises. Apart from the multi-bit operation, linearity, and symmetry in WRITE operation, device endurance and retention are two other essential characteristics of neuromorphic applications. Typically, endurance above 10 7 cycles is required for online training of a fully connected multi-layer perceptron neural network (De et al., 2021b). The endurance characteristics of the fabricated Fe-finFET devices are measured by applying 1-μs-wide pulses of amplitude ±5 V. Figure 6A illustrates the endurance characteristics along with the pulse scheme. The endurance over 10 9 write cycles in fabricated Fe-finFET devices makes them ideal for online training operations. Once a neural network is successfully trained using online training, the inference operation must be stable. As the online training of neural networks requires high endurance in the synaptic devices, retraining the neural network to compensate for accuracy loss during inference is not a viable option. Therefore, the retention of the synaptic devices plays an essential role in maintaining stable inference accuracy. The conductance drift due to retention degradation was measured by applying a ±5 V pulse of 1 µs width ( Figure 6B). The change of G ch for the intermediate states was obtained by linear interpolation. The difference in G ch was captured using a compact model to apply in system-level neural network simulation. We observe that the impact of systematic variations such as retention degradation has a more severe effect on the system-level performance of neural networks than the random variations. Finally, we have benchmarked the performance of the fabricated finFET devices with other emerging non-volatile memories ( Table 1).

Fe-FinFET as a Synapse in Neuromorphic Applications
To quantify the impact of variations in Fe-finFETs, neuromorphic simulation has been performed. The modeled LTP and LTD characteristics along with the experimentally calibrated device-to-device and cycle-to-cycle variations or the standard deviations (σ D2D and σ C2C ) in V th distribution have been used to train a Fe-finFET-based multilevel perceptron (MLP)-based neural network with the MNIST data set (Lecun et al., 1998). The layers of the neural network are illustrated in Figure 7A. Figure 7B shows the implementation of ferroelectric FIGURE 6 | (A) Endurance characterization conducted by ±5 V pulse of 1 µs width shows excellent performance up to 10 9 cycles. Further reduction of amplitude and width of WRITE pulses can improve the WRITE endurance (De et al., 2021b). (B) Endurance and retention characteristics are always counter complimenting. WRITE pulses of lower amplitude and width always engender good endurance characteristics but they result in poor retention. Therefore, optimization of the pulse scheme is necessary to create a balance between endurance and retention characteristics. The retention characteristics, measured at 85°C, show up to 10 4 s, illustrate excellent extrapolated data retention over 10 years with the same pulse amplitude and width used to measure the endurance. devices as a pseudo-crossbar synaptic memory array (Jerry et al., 2017) for the neuromorphic simulation in our CIMulator platform (Le et al., 2020). The CIMulator platform is used to estimate classification accuracy for given device's characteristics and statistical distribution. Two scenarios are considered. The first scenario is online training, wherein 60,000 MNIST image samples are used to train the hardware neural network, as illustrated in Figure 7. The back-propagation algorithm was adopted for training, with a batch size of 600. In the online training scenario, each device-to-device variation (ΔV th ) is fixed at the beginning of training and remains unchanged throughout the training process. For cycle-to-cycle variation, on the other hand, ΔV th changes randomly during READ or WRITE throughout the training process for each device in each cycle. The second scenario is offline training, wherein neural network weight coefficients are pre-trained in software (without considering hardware-related variation). The weights are subsequently written into hardware. Such hardware neural networks will operate differently from software with errors due to device-to-device variation. In this case, there is no chance for the neural network to have its weight adjusted to compensate for hardware-related device-to-device variation.

RESULTS
The online training simulation of Fe-finFET-based synaptic arrays shows that the continuous weight adjustment can avoid the accuracy drop due to device-to-device variations during the training process. While online training is completed, there is minimal accuracy degradation ( Figure 8A) due to the sole impact of the device-to-device variation in the WRITE process. Figure 8A highlights that the online training is beneficial toward the robustness of neural networks in the presence of device-to-device variation. For example, for a device with 10% higher intrinsic maximum conductance, the training algorithm automatically adjusts it to have a higher V th , reducing the conductance by 10% to compensate for the higher intrinsic conductance. This is not the case for offline training. The accuracy drops below 90% when the device-to-device variation exceeds 250 mV.
Unfortunately, the online training cannot fully compensate for cycle-to-cycle variation, as this variation source changes from one cycle to another ( Figure 8B). The accuracy quickly degrades as cycle-to-cycle becomes larger. A 400 mV cycle-to-cycle variation brings down the accuracy to 90% during the training process. Fortunately, our hardware measurements show that cycle-tocycle variation is only 17.7 mV in V th , so accuracy degradation is minimal. Figure 8C shows the cumulative impact of the deviceto-device and cycle-to-cycle variation. The degradation in performance in terms of accuracy becomes acute in the offline training scenario. The accuracy was dropped from 97.46 to 90.26% by the device-to-device variation alone. The cumulative effect of the device-to-device and cycle-to-cycle variation deteriorates the recognition accuracy down to 46.81%. Such non-linear behavior is a distinct characteristic of neural networks. When variability is small within a certain threshold, added cycle-to-cycle variation has little impact. On  Frontiers in Nanotechnology | www.frontiersin.org January 2022 | Volume 3 | Article 826232 the other hand, once such threshold is reached, the slight randomness of 17.7 mV has a significant effect on degrading neural network classification accuracy. It is worth noting that in CIMulator, we have adopted the accumulated weight updated method (Hubara et al., 2018) to train the neural network. It is a methodology similar to the Σ-Δ modulation method in communications, which saves the weight residue for each training cycle to compensate for the insufficient weight resolution. It provides good immunity for neural networks with low-weight precision, or low number of states. The downside, however, is the necessity of additional circuits (digital memory) to store the residue portion of the weights, which are not yet written to hardware.
In Figure 9A, we compare the accumulated update mode and the conventional linear update mode, where the weight is rounded to the nearest quantization value with residue discarded. The inference accuracy for linear update mode is near 10% (no recognition capability) until the weights have 6 bits or more (>90% accuracy). On the contrary, the accumulated weight update method requires much lower precision, where 97% inference accuracy can be achieved with only two states or 1 bit. As low V g read is a pre-requisite for low-power applications of neural networks, the accumulated weight update method is a promising algorithm to train Fe-FET with relaxed requirements for high V g read (more number of states). Although previous researchers have shown multilevel programming in , this is the first testament to the number of conducting states on gate voltage for read, V g read . Therefore, the impact of the V g read on inference accuracy has been investigated. Figure 9B shows the effect of V g read for a specific number of bits. V g read as 1 V, it turns out to be the most optimal solution for all four cases shown in the figure. As discussed in the previous section, this V g read dependency results from different on-off ratios of channel conductance at different gate voltages during READ operations.
Finally, we evaluate the cumulative impacts of device variation, retention degradation (assuming G ch drops to 22.6% for HRS and 74.3% for LRS, with intermediate states linearly interpolated), and cycle-to-cycle read variation for inference accuracy ( Table 1). Scenario A is the case for the binary neural network (BNN), where all non-ideal-effects-induced accuracy degradation is minimal due to a more significant noise margin. Scenario B is where four-level states are programmed (as in Figure 5), displaying excellent immunity FIGURE 8 | (A) Impact of device-to-device variations is significant only for offline-trained neural networks. (B) Impact of cycle-to-cycle variations cannot be compensated by online training. Hardware suggests its effect is negligible. (C) Effect of variations on training accuracy is minimal for hardware-calibrated variability for the online training scenario but is significant for the offline training scenario.
FIGURE 9 | (A) Accumulation mode operation requires less number of bits to achieve high accuracy compared to the trivial weight update mechanism. However, this accumulated weight update method requires an extra high precision volatile memory block. (B) V g read optimization for recognition accuracy.
Frontiers in Nanotechnology | www.frontiersin.org January 2022 | Volume 3 | Article 826232 8 toward such variations. Scenario C is the case where we have considered analog synaptic weights described in Figure 4C. Severe degradation of inference accuracy is observed with retention degradation-induced synaptic weight alteration.

CONCLUSION
The overall performance of a neuromorphic system is the outcome of the confederated performance of the device, peripheral circuits, network architecture, and algorithm. In this work, we fabricated a nanoscale HZO FeFET device and analyzed the physical reasons for two primary variation sources: 1) paraelectric/ferroelectric phase mixture due to incomplete or insufficient crystallization, which results in the device-to-device variation and 2) random telegraphic noise due to trapping and de-trapping events, which causes the apparent variation from cycle-to-cycle. Experiments are underway to mitigate such variability, especially for small devices. We then translate the observed variability to non-ideality in neural network applications. If online training is possible, 96.34% for MNIST handwritten digits recognition in the presence of variations is achievable. However, offline training may be required due to system constraints, in which case even a moderate device-to-device variation becomes alarming and must be mitigated through further material optimization. We concluded that the impact of experimentally observed variations on systemlevel accuracy could be negligible by manipulating the neural network algorithm and architecture. Quaternary and binary neural networks both demonstrated excellent immunity toward the cumulative impact of stochastic and systematic variations, whereas the hardware neural network with analog weights shows vulnerability to variation and retention degradation ( Table 2).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material; further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
The manuscript was written by SD and DL. The primary idea of this research was developed by SD, DL, Y-JL, and TK. Fabrication of the devices and electrical characterization was performed by SD, B-HQ, FM, ML, and TA. MB and H-HL developed the neuromorphic simulation platform and conducted neural network simulation. P-JS, C-JS, and Y-JL developed the Fe-finFET fabrication platform.