An RRAM-based implementation of a template matching circuit for low-power analogue classification

Foster, Patrick; Papandroulidakis, Georgios; Serb, Alex; Stathopoulos, Spyros; Prodromakis, Themis

doi:10.3389/felec.2025.1568377

ORIGINAL RESEARCH article

Front. Electron., 30 April 2025

Sec. Nano- and Microelectronics

Volume 6 - 2025 | https://doi.org/10.3389/felec.2025.1568377

An RRAM-based implementation of a template matching circuit for low-power analogue classification

Patrick Foster*

Georgios Papandroulidakis

Alex Serb

Spyros Stathopoulos

Themis Prodromakis

Centre for Electronics Frontiers (CEF), Institute for Integrated Micro and Nano Systems, University of Edinburgh, Edinburgh, United Kingdom

Recent advances in machine learning and neuro-inspired systems enabled the increased interest in efficient pattern recognition at the edge. A wide variety of applications, such as near-sensor classification, require fast and low-power approaches for pattern matching through the use of associative memories and their more well-known implementation, Content Addressable Memories (CAMs). Towards addressing the need for low-power classification, this work showcases an RRAM-based analogue CAM (ACAM) intended for template matching applications, providing a low-power reconfigurable classification engine for the extreme edge. The circuit uses a low component count at 6T2R2M, comparable with the most compact existing cells of this type. In this work, we demonstrate a hardware prototype, built with Commercial-Off-The-Shelf (COTS) components for the MOSFET-based circuits, that implements rows of 6T2R2M employing $T i O_{x}$ -based RRAM devices developed in-house, showcasing competitive matching window configurability and definition. Furthermore, through simulations, we validate the performance of the proposed circuit by using a commercially available 180 nm technology and in-house RRAM data-driven model to assess the energy dissipation, exhibiting $60 pJ$ per classification event.

1 Introduction

In recent decades, the conventional von Neumann architecture that is dominating most modern computing systems seems to be reaching a performance saturation point mainly due to the well-documented memory wall bottleneck occurring as a result of the separation between computing units and memory modules (Wilkes, 1995; McKee, 2004). As further miniaturisation of MOSFET technology provides diminishing performance gains with every new CMOS generation, a lot of focus has been placed on investigating novel computing technologies and architectures to satisfy the increasing need for higher performance and low energy computing. This is especially important for power and area restricted applications at the edge and extreme edge (Menon et al., 2021). One important avenue of investigation is the neuro-inspired architectures, where the close integration of computing and memory is observed (Graves et al., 2020). This co-location of computing and memory enables the implementation of novel in-memory (IMC) or near-memory (NMC) computing approaches that showcase promising results in continuing the performance improvements in critical algorithms, such as machine learning (ML), while being also energy efficient (Sebastian et al., 2020). One such paradigm is the introduction of associative memories that can perform template matching operation, a feature critical for signal processing at the edge where noisy inputs require fast and low power classification.

One of the most well-documented models of neuro-inspired associative memories are the content addressable memories (CAMs). CAM is a broad term describing memory-centric technologies where the memory cells compare their stored state with an input locally and in parallel. This allows a bank of CAM to be quickly searched by applying the information that is being sought to the input of the array and monitoring the cell outputs (Serb et al., 2018). The combining of low level computing tasks with memory blurs the boundary between the two and has the potential for improvements in computing (Karam et al., 2015) especially in ML applications for resource-constrained implementations. CAM can serve as the foundation for novel Artificial Neural Nets (ANNs) and other neuromorphic systems especially when co-integrated with emerging memory technologies, such as Resistive RAM (RRAM) (Pedretti et al., 2021a).

In this work, we are showcasing a RRAM-CMOS ACAM circuit capable of mapping two analogue thresholds to identify a specific matching window. We provide detailed simulations and proof-of-concept experimental results by integrating in-house developed $T i O_{x}$ -based RRAM devices. The proposed ACAM cell acts as reconfigurable building block for a template matching systems, which performs low-power operation for associative inference at the edge. The reconfigurability is introduced to the circuit through the RRAM technology. We provide a detailed analysis through SPICE simulations of the proposed circuit to assess its performance and then develop a proof-of-concept experimental board using both Commercial-Off-The-Shelf (COTS) and experimental RRAM devices to further investigate the behaviour of such template matching system. More specifically, in Section 2, we provide a brief overview of existing RRAM ACAM technologies and their applications. In Section 3, we explain the circuit design and operation. In Section 5, we are showing simulation results of the novel split TXL circuit design using a commercially available 180 nm technology. In Section 4, we are exhibiting hardware results, and in Section 6, we are providing concluding remarks.

2 Background

In general, CAMs can be divided into two types: digital CAM and analogue CAM (ACAM). Digital CAM stores only binary information and compare this stored value with a binary query vector (Lin et al., 2003). In some cases, CAM can have a third stored state for “do not care”; this is called ternary CAM (Graves et al., 2020), (Arsovski et al., 2003). SRAM-based CAM systems, either binary or ternary, are well-known systems and are commonly found deployed as network switches, where they store large tables of MAC addresses for fast signal routing (Liu, 2002). While CMOS binary CAM is well established, there is ongoing research in the area, trying to integrate emerging non-volatile memory (NVM) elements, such as RRAM (Pan et al., 2021). Given the multi-bit memory capabilities of these emerging memory technologies, the rapid development of novel ACAM design has been exhibited.

ACAM design follows the main principles of digital CAM design but with appropriate circuit modifications to enable the in-situ processing of analogue signals. This is usually achieved by using multi-bit memory cells. In addition to their use in neuromorphic systems (Hock et al., 2013), this type of CAM can also be used in analogue signal analysis methods such as template matching (Graf and Jackel, 1989). In this application, ACAM cells are organised as memory arrays with each memory cell having a small additional circuitry for a per cell comparison operation between an input and the stored value (Serb et al., 2018; Pedretti et al., 2021a). The ACAM array performs a massively parallel search operation for all available templates inside the memory and responds with the address of the classified template if a match event occurs (Figure 1). Unlike binary CAM research, where the goal is to add nonvolatility to the cells without increasing power dissipation, analogue CAM research seeks to replace high power dissipation and area comparators with simplified circuits that take advantage of the analogue nature of emerging memory technologies. The advent of emerging memory technologies enables the implementation of CAMs with analogue memory components making possible the usage of such systems to perform template matching in the analogue domain without the need to convert information from analogue to digital (Serb et al., 2018). While there are attempts to use other emerging memory technologies such as FeRAM (Yin et al., 2020; Rajaei et al., 2021a) for this purpose, RRAM implementations of ACAM can showcase advantages due to the granularity of conductance states in RRAM technologies and the ease of tuning offered by resistive elements (Pedretti et al., 2022; Rajaei et al., 2021b; Pedretti et al., 2021b). Such implementations offer a promising candidate for near-sensor fast template matching engines for a new era of bio-signal processing systems at the extreme edge.

Figure 1

Figure 1. Concept schematic showcasing an example analogue content addressable memory (ACAM) parallel search operation. Each cell performs a comparison between an input of the analogue template query and its match window, as defined by its memory contents. If all cells in a row have a hit then the ACAM responds with the memory address of the full matching row. The match/mismatch can be define according to an array-wide threshold.

Recently, a range of different ACAM cell designs employing emerging memory technologies have been proposed towards implementing novel template matching systems. In (Serb et al., 2018), a 6T2R design has been shown to implement a template matching circuit capable of detecting specific voltage levels using a 2T2R and a 4T memory readout circuit. This design showcase good area utilisation and showcase an early look on how hybrid RRAM-CMOS circuits can be used to map a threshold point for which a short event-triggered pulse is produced. In (Li et al., 2020), a $T a / T a O_{x}$ -based ACAM design using also a 6T2R configuration has been showcased. In this design, two different thresholds are mapped in the 2T2R circuit which is implemented as two separate 1T1Rs. The rest of the cell (4T) implements the window matching comparison. The circuit shows good area and match window configurability but the use of 1T1R voltage dividers when reading the memory could result in increased power consumption.

3 Circuit design and operation

The circuit proposed in this work, shown in Figure 2, is based upon a skewed inverter design showcased in Serb et al. (2018). The cell design in Serb et al. (2018) uses a hybrid RRAM-CMOS inverter to adjust the switching point of a second inverter stage, with a current mirror to detect the higher current at that switching point. By comparison, our proposed design uses two hybrid inverters to map two analogue thresholds to enable a higher configurability of the matching window. Two complementary FET pairs with source degeneration are used as the input stage with their outputs assessed by an XOR mode transmission gate. The current through this gate feeds provides the output signal, which can be connected in parallel with other cells to a simplified matchline pull-up network to perform template matching. The ratio between source degeneration on the input PMOS and NMOS alters the threshold of the inverter. As the threshold is dependent on the ratio between the top and bottom degeneration resistors and not the total resistance of the branch, the threshold can be controlled by altering the state of only one of resistive elements. As such, this design uses a single dynamic RRAM device for each hybrid inverter to control each threshold, with the opposing degeneration being provided by a static balancing resistor.

Figure 2

Figure 2. Circuit schematic of the split TXL ACAM cell proposed in this work. The circuit implements two hybrid RRAM-CMOS inverters ( $R_{M 1}$ - $M_{P 1}$ - $M_{N 1}$ - $R_{1}$ and $R_{2}$ - $M_{P 2}$ - $M_{N 2}$ - $R_{M 2}$ ) that map the low and high thresholds (encoded as the outputs $O U T_{L O}$ and $O U T_{H I}$ of the hybrid inverters) and two extra MOSFET devices ( $M_{PML}$ and $M_{NML}$ ) that performs the comparison between the low and high threshold and connect the output with the $B L$ charging the matchline $M L$ for the case of match. The $B L$ has been precharged to $V_{D D}$ before the ACAM operation. Devices $R_{1}$ and $R_{2}$ , that operate as the other part of the resistive ratio determining the matching window, are implemented as static resistive elements.

When the input is above the threshold of the $R_{M 1}$ branch inverter and below the threshold of the $R_{M 2}$ branch inverter, both the PMOS and NMOS of the output stage ( $M_{MLP}$ and $M_{MLN}$ , respectively) conduct, thus connecting the charged $B L$ to discharged $M L$ . The $B L$ is charged to $V_{D D}$ and the $M L$ is discharged to $G N D$ . This disrupts the conventional operation of precharged $M L$ that remains charged in the case of a full row match or drains for the case of a single mismatch. Instead, this method enables the charge of the discharged $M L$ by a packet of charge controlled by the output branch of each cell in a row. This technique provides greater flexibility in the analogue operation of the row since we can define how many charge packets are required for a match and decide on a configurable threshold. This threshold is implemented as part of the matchline readout circuitry (i.e., the threshold voltage of a sense amplifier). While this configuration permits the independent control of each threshold with a single programmable resistive component, the required range of memristor states is quite high, although not to the degree required by previous works (Li et al., 2020) where the ratio of minimum to maximum resistive state was $5 \times$ higher. This is because the configurable element must go both above and below the balancing resistor by a factor of 10 to achieve a fully configurable range of threshold values that could be helpful in supporting a wide range of applications.

4 Hardware measurement

To experimentally test the TXL proof-of-concept we are exploring the design space of the circuits initially by using ideal resistors and then testing with in-house developed RRAM devices. Two CAM array PCBs were designed for use with an existing instrument platform (Foster et al., 2022). The PCB implementations of this cell use the SSM6L36TU,LF (Toshiba, 2014) complementary MOSFET pair, along with $1 MΩ$ balancing and output resistors. The FETs used are small signal MOSFETs with threshold voltages comparable to integrated FETs. The high value balancing resistor results in low supply current in the RRAM-skewed inverters, allowing the inverters to switch a little closer to the supplies, reducing the required headroom. For the dynamic element, the resistor model (shown in Figure 3a) used a multiplexer switched array of 16 resistors, geometrically spaced between $100 kΩ$ and $10 MΩ$ . The RRAM PCB design (shown in Figure 3b) used a PLCC68 package of $30 \times 30 μm T i O_{x}$ -based RRAM devices (Michalas et al., 2018). To write the state of the RRAM for testing, SMU channels were connected to the sources of $M_{P 1}$ and $M_{N 2}$ (Figure 2), with the SMUs set to high impedance during tests to avoid interfering.

Figure 3

Figure 3. Photographs of the PCB prototypes. (A) The initial PCB prototype using conventional resistor components for implementing rows of the proposed ACAM cell, (B) The second RRAM-MOSFET PCB prototype implementing rows of the proposed ACAM cell with the connectivity to the resistor components being replaced by a PLCC68 socket for connecting experimental RRAM packages.

To assess how the window responded to changes in the control devices, the current output is denoised with a 50 sample moving average filter and the differential calculated. The window width is calculated as the difference in input voltage between the minimum and maximum of the derivative (Figure 4). This method was chosen as it provides a meaningful result in circumstances where the window peak is not consistent, where a more common method such as full width at half maximum (FWHM) might not. The differential method typically gives a slightly more pessimistic valuation of the circuit performance than FWHM.

Figure 4

Figure 4. A diagram of the method used to determine window width. In this case the window width is calculated as $1.1 V$ .

4.1 Resistor-based test PCB

The behaviour of the resistor-based emulator board was measured by setting one of the configurable elements to its maximum value and conducting an input sweep at each step of the other configurable element. In this PCB design, the CAM circuit was supplied at $1.8 V$ and the input swept from $0 V$ to the supply voltage by stepping through every DAC code. The current was measured at every step, for a total of 5,898 samples. More information on how the DAC operates in our experimental setup using ArC TWO instrumentation board can be found in Foster et al. (2022).

As can be seen in Figure 5, the circuit produces a significant output current at input values between the two thresholds. The upper threshold was measured to be almost linear with the logarithm of the control resistor. The upper threshold displays the characteristic behaviour of a current starved inverter, because the NMOS in the output branch acts as a voltage follower. This somewhat compromises the specificity of the circuit, but not to the extent that would justify the extra inverter needed to swap it for a PMOS. The lower threshold shows non-linearity behaviour of control, with a pronounced s-curve. Additionally, the maximum window width was calculated for a range of supply voltages. This was found to be linear, signifying that the headroom required is a function of the threshold value of the MOSFETs used, regardless of the operating current.

Figure 5

Figure 5. Graphs showing the effect of M1/2 on the match window of the PCB resistor model. (a) An IV sweep of M1 between $10 MΩ$ and $100 kΩ$ , showing that the threshold voltage increases as M1 drops. (b) An identical sweep of M2, showing that the threshold voltage decreases as M2 drops. (c) The width of the window as a function of resistor state. (d) The maximum window width as a function of the supply voltage.

4.2 RRAM-based test PCB

For the RRAM-based PCB tests, the RRAM devices were set to the preferred conductance state with $100 - 500 μs$ pulses and then read at $250 mV$ , before performing a sweep of the input voltage in $1 mV$ steps, reading the current at each step for a total of 1801 samples per trace. For the purposes of calculating the threshold position, the current output curve was denoised with the same 50 sample moving average filter. The mono-layer $T i O_{x}$ -based RRAM devices used in this test (Michalas et al., 2018) are an experimental technology and exhibit some volatility and drifting of their conductance states. In the sweep of M1, some tests saw the memristor taking random values within a small range, as can be seen in Figure 6a. In the sweep of M2, a test at high resistive state saw the control device switching between two distinct values, as can be seen in the outermost trace of Figure 6b. This may be indicative of the formation (and disruption) of conductive filaments within the titanium oxide layer of the RRAM devices, as this is one of the hypothesised mechanisms of the memristive behaviour of these devices (Zaffora et al., 2021).

Figure 6

Figure 6. Graphs showing the effect of M1/2 on the match window of the PCB RRAM model. (a) An IV sweep of M1 between $8 MΩ$ to $30 kΩ$ with M2 at approximately $7 MΩ$ , showing that the threshold voltage increases as M1 drops. (b) A similar sweep of M2 from, $200 kΩ$ to $8.6 MΩ$ with M1 at approximately $6 MΩ$ , showing that the threshold voltage decreases as M2 drops. (c) The lower threshold voltage as a function of the M1 state. Note that the X-axis is reversed compared to subfigure (d). (d) The upper threshold as a function of M2 state.

Toward avoiding issues with device-to-device and cycle-to-cycle stochasticity of RRAM devices, instead of assessing the window width, the position of each threshold was calculated. While this is less useful in characterising the circuit, it prevents issues with setting one threshold from affecting the accurate assessment of the other. Despite this, the tests on the bottom threshold showed a similar s-curve shape to the resistor model. The shape of the window produced is also very similar to the shape obtained in the resistor model. The upper threshold was not so well behaved, showing significant loss in the sharpness of the transition. Further, there was a loss of monotonicity at high resistive states, although this may be due to changes in the state of the M2 device between the measurement taken at the start of each test and the state it held during the test. Despite the irregularities, the traces produced in the tests of the memristor model are clearly of comparable shape and range of position to the results of the resistor model. Thus, we demonstrate that resistor tuned models of TXL circuits can be used as a good emulator for RRAM for the assessment of characteristics not explicitly related to the threshold shape, such as power consumption and footprint area. Through this method we are able to perform hardware experiments using conventional technologies and use the RRAM arrays in a program-once-read-many operational scheme, where the weights are defined through software and resistor-based emulator array testing and then mapped into a RRAM array.

4.3 RRAM-based demonstrator board

In order to demonstrate the functionality of the proposed TXL circuit alongside real RRAM devices designed and fabricated through in-house processes, we have designed and implemented a custom demonstrator board, shown in Figure 7a. The demonstrator board is compatible with the instrumentation infrastructure and can be used to test the functionality of multiple TXL circuits connected in matchlines and assess their behaviour. For the experiment, we are using MOSFET-based COTS circuits in addition to RRAM arrays we are fabricating in-house. Custom Python software script is controlling the different operation modalities of the demonstrator board. The custom demonstrator board used in the experiment is shown in Figure 7b.

Figure 7

Figure 7. (a) The experimental setup of the TXL demonstrator board is shown. The TXL demonstrator consist of two main parts: the RRAM array and the CMOS components implementing the TXL circuits, the Sample & Hold, Trigger system and control. The demonstrator is attached to the ArC TWO instrument. A benchtop power supply is used to provide the main voltage sources of the instrument board. The TXL demonstrator board is powered from the ArC Two main board. The controls for the main instrumentation board and the TXL demonstrator board are provided through PC. The input is provided by and the output is readout by the instrumentation board through appropriate setup of the flexible SMUs. (b) A closer looks at the custom demonstrator board used to test real RRAM devices with the TXL circuit. (c) An example operation of the TXL demonstration board is shown. Three different sets of sampled signals are used to test the cases where a hit in the first template, a hit in the second template and no hit is computed. The sets of query inputs are captured through the trigger and sample and hold circuit. The results of each comparison are represented as current readout and the use of a software threshold policy classifies the result per matchline as match or mismatch depending if above or below a predefined threshold value. The threshold value can be arbitrarily define and in this experiment the value of $V_{T H} = 700 u A$ was selected. (d) A concept diagram showcasing the expected application. The TXL array implement a template matching engine capable of near-sensor classification. (e) A pre-processing unit is responsible for triggering the S&H circuit and the samples are provided to the array.

In Figure 7d we showcase a concept application of TXL array as part of a bio-signal classification engine, such as in the case of spike sorting template matching. In the light blue coloured area is the demonstrator board circuits to be tested in hardware using real RRAM devices for the TXL array, while the surrounding functions (sense, pre-processing, readout classification) are implemented in software. Part of the hardware implementation for the demonstrator board was the triggering and sample and hold circuit (shown in Figure 7e) which can be operated when using an example waveform. The triggering circuit uses a voltage threshold trigger to identify when it needs to start sampling the input waveform. The sample and hold circuit is getting loaded with equally timed samples of the input waveform. The control of the sampling through timers and additional logic is also implemented on-board.

As shown in Figure 7c, we are capturing distinct samples of a continuous signal, storing each one in a sample and hold cell. During the parallel search operation, the sampled voltages (forming an analogue query vector) are supplied as inputs to the two TXL rows with each row representing a stored template. Depending on each stored template a rate of match is calculated in analogue signal domain based on the match events accumulating for each template. Practically, the matching rate for each comparison of the query input with each template appears as current per matchline. The current is measured through ArC TWO Instrumentation Board SMUs that have been flexibly configured through the versatile ArC TWO readout channel mapping. We can define a threshold on each matchline which if surpassed will indicate a match event for the query. Otherwise, a mismatch occurs between the query and the template. The decision on the matching threshold can be decided arbitrarily depending on the specific hardware configuration of the implemented array as well as the application in question. It is possible through the matching threshold to perform an exact or approximate matching operation.

Through this demonstrator board we are able to initialise our investigation into the dynamics of the TXL circuits as part of larger ACAM arrays. The demonstrator board has been designed to accommodate extensive design for testing capabilities by enabling the access to all important nodes of each TXL cell for probing and thus enables the full control, characterisation and testing of the circuits and RRAM devices under test. Hence, a faster characterisation and calibration to ACAM operation cycle can be performed, accelerating the experimental process. The current demonstrator board enabled us to extract useful information to guide our approach when integrating a high number of RRAM devices as Back-End-Of-Line (BEOL) integration towards developing hybrid RRAM-CMOS ICs.

5 Circuit simulations

5.1 6T2R2M cell performance analysis

To assess integrability estimate the power consumed by the circuit, a schematic level integrated circuit simulation using the Cadence Virtuoso environment and a commercially available 180 nm CMOS technology product development kit (PDK) was developed. This technology node includes planar MOSFETs, for either $1.8 V$ or $5 V$ . For simulating the RRAM devices, we used a RRAM model based on the work of (Messaris et al., 2018). As integrated design provides much greater control of the specifics of the transistors used, three configurations of the proposed circuit were designed using different geometry parameters for employed the MOSFETs devices in the two hybrid RRAM-CMOS inverters (as shown in Table 1): (Wilkes, 1995) with minimum size input transistors (McKee, 2004), with wide input transistors, and (Menon et al., 2021) with minimum size native NMOS and conventional PMOS input transistors. These configurations investigate three paths for development. The minimum size should have the lowest energy usage of any configuration of this circuit, although other aspects of performance might suffer. The higher gain offered by the wider FETs could offer a balance between energy usage and performance. The low threshold of the natively doped FETs could offer improved performance despite using minimum size native FETs. In all cases the two transistors on the output branch were minimum size. All MOSFET components used for the showcased Spectre-SPICE simulations are $1.8 V$ devices. While it may be possible to replace the balancing resistors with static memristors, the assumption is that polysilicon resistors will be used. This constraint makes the $1 MΩ$ balancing resistors of the PCB model wildly impractical, due to the area required. The simulations shown here were conducted with $10 kΩ$ balancing resistors and memristors of $2.5 kΩ$ to $250 kΩ$ . Note that the memristor model used has the specified conductance at $0.5 V$ and has conductance closer to a $1 kΩ$ to $100 kΩ$ resistor at the inflection point of the inverters. In addition to this, an extra gate was added to both transistors in the output stage, to allow the output to be disabled when the input was being adjusted or the circuit shut down. While the move to a higher current operating point strips the circuit of a significant portion of its range, it allows for the recovery of much of the sharpness that is now expected to be lost when implemented with memristors. It should be noted that a reduced range is not necessarily a problem, as the circuits before the CAM cell in the signal chain can be designed to output in the operating range of the CAM cell, and may even have their own headroom requirements that make a wider window unnecessary.

Table 1

Table 1. Chart of transistor parameters in the three circuit configurations that were simulated in Cadence Virtuoso using a commercial 180 nm technology.

5.2 Performance assessment

To assess the matching window maximum width, static DC simulations of the circuits were run. The circuits were supplied at $1.8 V$ and the inputs swept from $0 V$ to the supply voltage, measuring the current measured at the output. This was repeated for geometrically spaced values of $2.5 kΩ$ to $250 kΩ$ for the memristor. In all cases the output was connected to a $100 kΩ$ dummy load. The window widths were calculated using the same differential method mentioned in Section 4.

In the circuit configuration with minimum size input transistors, the window width was drastically reduced (Figure 9a). Furthermore, the range of values for the lower thresholds was roughly half the maximum window width (Figure 8a). While the transitions were sharper, this reduced range represents a limitation in circuit’s functionality, as a narrow window can only be set in a $100 mV$ range. The wide input stage circuit showcased better matching window configurability, losing less maximum width and retaining most of the threshold sweep (Figure 8b). The sharpness of the transitions was also superior to the minimum width circuit, due to the higher gain of the input FETs (Figure 9b). The native input stage circuit displayed a wider maximum window width than even the wide circuit, although the minimum size PMOS left it with a lower threshold range comparable to the minimum circuit (Figure 8c). The matching window transitions in the native circuit were noticeably less sharp (Figure 9c). Compared to the PCB implementations, all configurations the window was biased towards the bottom of the supply range. In the minimum and wide configurations, this is a result of equally sized complementary FETs. The lower conductivity of PMOS in integrated design shifts the average threshold lower, skewing the window position. In the native configuration a similar but more pronounced effect was caused by the lower threshold voltages of native devices, the window of the native circuit was much closer to $0 V$ than the other configurations, which were almost centred in the supply range. In all circuits, plotting the window width as a function of the logarithm of M1 showed less non-linearity than the lower threshold of the PCB model (Figure 8d). The differences in the behaviour of the top threshold between the three circuit configurations were less pronounced (Figure 9b). All three were able to bring the top threshold to within $100 mV$ of the bottom threshold. The only significant differences were the inability of the minimum width circuit to reach the maximum window widths of the other two, and the less defined transitions of the native circuit.

Figure 8

Figure 8. (a) A graph of the IV characteristics of the minimum circuit, sweeping M1 from $2.5 kΩ$ to $250 kΩ$ (b) An identical sweep of M1 with the wide circuit. (c) An identical sweep of M1 with the native circuit. (d) The window width of all three configurations as a function of M1.

Figure 9

Figure 9. (a) A graph of the IV characteristics of the minimum circuit, sweeping M2 from $2.5 kΩ$ to $250 kΩ$ (b) An identical sweep of M2 with the wide circuit. (c) An identical sweep of M2 with the native circuit. (d) The window width of all three configurations as a function of M2.

5.3 Energy dissipation

To estimate the energy required to test a single sample a transient analysis simulation was run, with a pulsed mode of operation as might be used in a future implementation. In this test, the circuit began with the supply energised, the output disabled through the gates added to the output FETs, and the input parked at either supply or ground. The minimum and wide circuits were parked at ground and the native circuit parked at the supply voltage. The input was brought to the test value and held there for $2.35 ns$ to settle. The output enable signal was then pulsed for $450 ps$ . $200 ps$ after the enable pulse ends, the input is returned to its parked state. The energy was calculated by integrating the sum of the instantaneous power at the input, the enable input, and the supply. As might be expected for an analysis including the output current, the test energy of a hit was the peak energy in all cases (Figure 10), with a hit at the threshold being higher than a hit in the middle of the window. This is because an input at a threshold results in a low conductance path from supply to ground for the short duration that the input is active. It is worth noting that the miss energy on the opposite side of the window to the parked state was substantially higher than the miss energy on the same side of the window. This observation is probably because bringing the input signal to the test voltage and back causes the input to pass through both input inverter’s crossover points. This would cause a small surge of current each time, resulting in the observed plateau.

Figure 10

Figure 10. A graph of the test energy of a single simulated 6T2M2R ACAM cell with maximum window width, as a function of input voltage.

Further simulation data was gathered for relevant process corners (Table 2). A corner is a set of environmental or process conditions that might affect circuit performance. Each of the circuits was simulated at four corner conditions: room temperature, body temperature, PMOS/NMOS above specified gain (Fast/Fast), and PMOS/NMOS below specified gain (Slow/Slow) (Table 3). A hit and a miss voltage were chosen for each circuit, to give estimates for a typical use case. The variation with temperature was minimal, but the effect of process variation was substantial, with a Fast/Fast hit dissipating almost twice the energy of a Slow/Slow hit.

Table 2

Table 2. Maximum matching window width of the proposed ACAM cell at several 180 nm CMOS technology process corners.

Table 3

Table 3. Energies calculated at typical input stimulus voltages for several process corners in both hit and miss cases.

5.4 Process variation effects

To obtain a clearer picture of the impact of process variation, a Monte Carlo (MC) analysis was conducted in Cadence Virtuoso simulation environment. This approach uses a large number of simulations with randomised process variation to provide a statistical model of the expected impact. In this test, the characteristic being monitored is the maximum window width, calculated using the same differential method used earlier. To reinforce the statistical importance of our results, 250 point simulations were run for each circuit configuration. In Figure 11 the aggregate results for the MC simulation are exhibited.

Figure 11

Figure 11. (a) A histogram plot and fit of a 250 point MC simulation investigating the maximum window width when process variations are considered. The process variations affect only the MOSFET and polysilicon balancing resistor components of the circuit design. (b) A table of the fit parameters $μ$ and $σ$ , assuming a normal distribution.

The resulting histograms of window width show a significant level of variation in the minimum and native circuits. If an error in fabrication causes a gate element to be $20 nm$ narrower than intended, the proportional error will be much larger on a minimum size FET than a wider one, so greater variance on the minimum size circuit is to be expected. Why the native circuit has slightly greater variance is not entirely clear, as the larger minimum size of the native FETs should result in marginally less variance, but the low doping level of the native devices likely makes them more vulnerable to fabrication errors there. The wide circuit showed the least variance, both in absolute terms and in proportion of average maximum window width.

6 Conclusion and discussion

In this work, we are proposing a circuit for template matching operation based on RRAM technology. RRAM devices are employed to map analogue weights into configurable thresholds for ACAM applications. The circuit considered in this work demonstrates a highly controllable and flexible analogue content addressable memory cell with acceptable power dissipation. As part of the energy-efficient design methodology we followed for this work, the circuit uses a current-mode charging scheme for the matchline instead of a more conventional precharged matchline with each cell discharging if there is a mismatch. Additionally, this enables the generation of an analogue output per template, which can be used to calculate the distance between the query input and the stored template. The proposed circuit can be used in many different neuro-inspired circuit and systems, such as a wide variety of ANNs, but the main focus of this circuit to perform template matching operations in a highly configurable and noise-tolerant approach makes it a promising candidate for in-sensor or near-sensor analogue signal processing at the edge, such as for the case of near-sensor neural spike sorting systems. Compared to the reconfigurable logic gates this work was developed from (Serb et al., 2018), this work demonstrates a greater degree of controllability for a similar device count and area. The match window can retain its configurability and sharp selection even with a smaller conductance ratios compared to prior work thus enabling a easier integration with a wide variety of potential RRAM technologies. Experimental testing with different RRAM technologies to assess the performance of the circuit under different operational conditions is planned as future work. This is possible due to the design of technology-agnostic PCB test and demonstrator board as shown in this work.

In future work, we are planning to further improve the current design and work towards a prototype integrated solutions to test a larger number of RRAM devices and TXL circuits using the existing framework. Additionally, it is worth noting that as the ratio between dynamic and balancing elements is the only control parameter that has a significant effect, $R_{M 1}$ and $R_{1}$ could swap positions. This would allow for both thresholds to have identical control methods, at the trivial cost of inverting the control of one threshold. With both dynamic elements connected to ground, the devices could be written through the diode structures formed by the triple well of the neighbouring NMOS, although this method has yet to be demonstrated. The power and area requirements for the circuit could be further improved by switching to a smaller technology node, such as $22 nm$ . Such circuit miniaturisation regarding its CMOS part will require a careful investigation in order to ensure the continuity of co-integration viability with RRAM technologies, such as the $T i O_{x}$ -based RRAM devices used in this work. Through the demonstrator board we showed how these low power window comparators might be arrayed to perform near sensor classification tasks. These experiments enabled us to further mature our design efforts for a higher level of integration of RRAM with CMOS through BEOL monolithic integration.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

PF: Writing – original draft, Writing – review and editing. GP: Visualization, Writing – original draft, Writing – review and editing. AS: Conceptualization, Supervision, Writing – review and editing. SS: Resources, Software, Writing – review and editing. TP: Conceptualization, Supervision, Writing – review and editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported in part by the Engineering and Physical Sciences Research Council (EPSRC) Programme under Functional Oxide Reconfigurable Technologies (FORTE) Grant EP/R024642/2, the EPSRC ProSensing Project (Grant No. EP/Y030176/1) and the RAEng Chair in Emerging Technologies under Grant CiET1819/2/93.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/felec.2025.1568377/full#supplementary-material

References

Arsovski, I., Chandler, T., and Sheikholeslami, A. (2003). A ternary content-addressable memory (tcam) based on 4t static storage and including a current-race sensing scheme. IEEE J. Solid-State Circuits 38, 155–158. doi:10.1109/jssc.2002.806264806264

CrossRef Full Text | Google Scholar

Foster, P., Huang, J., Serb, A., Stathopoulos, S., Papavassiliou, C., and Prodromakis, T. (2022). An fpga-based system for generalised electron devices testing. Sci. Rep. 12, 13912. doi:10.1038/s41598-022-18100-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Graf, H., and Jackel, L. (1989). Analog electronic neural network circuits. IEEE Circuits Devices Mag. 5, 44–49. doi:10.1109/101.29902

CrossRef Full Text | Google Scholar

Graves, C. E., Li, C., Sheng, X., Miller, D., Ignowski, J., Kiyama, L., et al. (2020). In-memory computing with memristor content addressable memories for pattern matching. Adv. Mater. 32, e2003437. doi:10.1002/adma.202003437

PubMed Abstract | CrossRef Full Text | Google Scholar

Hock, M., Hartel, A., Schemmel, J., and Meier, K. (2013). “An analog dynamic memory array for neuromorphic hardware,” in 2013 European conference on circuit theory and design (ECCTD), 1–4. doi:10.1109/ECCTD.2013.6662229

CrossRef Full Text | Google Scholar

Karam, R., Puri, R., Ghosh, S., and Bhunia, S. (2015). Emerging trends in design and applications of memory-based computing and content-addressable memories. Proc. IEEE 103, 1311–1330. doi:10.1109/JPROC.2015.2434888

CrossRef Full Text | Google Scholar

Li, C., Graves, C., Sheng, X., Miller, D., Foltin, M., Pedretti, G., et al. (2020). Analog content-addressable memories with memristors. Nat. Commun. 11, 1638. doi:10.1038/s41467-020-15254-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Lin, C. S., Chang, J. C., and Liu, B. D. (2003). A low-power precomputation-based fully parallel content-addressable memory. IEEE J. Solid-State Circuits 38, 654–662. doi:10.1109/JSSC.2003.809515

CrossRef Full Text | Google Scholar

Liu, H. (2002). Routing table compaction in ternary cam. IEEE Micro 22, 58–64. doi:10.1109/40.988690

CrossRef Full Text | Google Scholar

McKee, S. A. (2004). “Reflections on the memory wall,” in Proceedings of the 1st conference on computing frontiers, 162–167. doi:10.1145/977091.977115

CrossRef Full Text | Google Scholar

Menon, A., Sun, D., Aristio, M., Liew, H., Lee, K., and Rabaey, J. M. (2021). “A highly energy-efficient hyperdimensional computing processor for wearable multi-modal classification,” in BioCAS 2021 - IEEE biomedical circuits and systems conference, proceedings. doi:10.1109/BioCAS49922.2021.9645008

CrossRef Full Text | Google Scholar

Messaris, I., Serb, A., Stathopoulos, S., Khiat, A., Nikolaidis, S., and Prodromakis, T. (2018). A data-driven verilog-a reram model. IEEE Trans. Computer-Aided Des. Integr. Circuits Syst. 37, 3151–3162. doi:10.1109/TCAD.2018.2791468

CrossRef Full Text | Google Scholar

Michalas, L., Stathopoulos, S., Khiat, A., and Prodromakis, T. (2018). Conduction mechanisms at distinct resistive levels of Pt/TiO2-x/Pt memristors. Appl. Phys. Lett. 113, 143503. doi:10.1063/1.5040936

CrossRef Full Text | Google Scholar

Pan, Y., Foster, P., Huang, J., Serb, A., and Prodromakis, T. (2021). An rram-based associative memory cell.

Google Scholar

Pedretti, G., Graves, C., Serebryakov, S., Mao, R., Sheng, X., Foltin, M., et al. (2021a). Tree-based machine learning performed in-memory with memristive analog cam. Nat. Commun. 12, 5806. doi:10.1038/s41467-021-25873-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Pedretti, G., Graves, C. E., Serebryakov, S., Mao, R., Sheng, X., Foltin, M., et al. (2021b). Tree-based machine learning performed in-memory with memristive analog CAM. Nat. Commun. 12, 5806. doi:10.1038/s41467-021-25873-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Pedretti, G., Graves, C. E., Van Vaerenbergh, T., Serebryakov, S., Foltin, M., Sheng, X., et al. (2022). Differentiable content addressable memory with memristors. Adv. Electron. Mater. 8, 2101198. doi:10.1002/aelm.202101198

CrossRef Full Text | Google Scholar

Rajaei, R., Sharifi, M. M., Kazemi, A., Niemier, M., and Hu, X. S. (2021a). Compact single-phase-search multistate content-addressable memory design using one fefet/cell. IEEE Trans. Electron Devices 68, 109–117. doi:10.1109/TED.2020.3039477

CrossRef Full Text | Google Scholar

Rajaei, R., Sharifi, M. M., Kazemi, A., Niemier, M., and Hu, X. S. (2021b). Compact single-phase-search multistate content-addressable memory design using one fefet/cell. IEEE Trans. Electron Devices 68, 109–117. doi:10.1109/TED.2020.3039477

CrossRef Full Text | Google Scholar

Sebastian, A., Le Gallo, M., Khaddam-aljameh, R., Eleftheriou, E., Gallo, M. L., Khaddam-aljameh, R., et al. (2020). Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529–544. doi:10.1038/s41565-020-0655-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Serb, A., Khiat, A., and Prodromakis, T. (2018). Seamlessly fused digital-analogue reconfigurable computing using memristors. Nat. Commun. 9, 2170. doi:10.1038/s41467-018-04624-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Toshiba. SSM6L36TU LF datasheet (2014).

Google Scholar

Wilkes, M. (1995). “The memory wall and the CMOS end-point,” in ACM SIGARCH computer architecture news, 1994–1996. doi:10.1145/218864.218865

CrossRef Full Text | Google Scholar

Yin, X., Li, C., Huang, Q., Zhang, L., Niemier, M., Hu, X., et al. (2020). Fecam: a universal compact digital and analog content addressable memory using ferroelectric. IEEE Trans. Electron Devices 67, 2785–2792. doi:10.1109/ted.2020.2994896

CrossRef Full Text | Google Scholar

Zaffora, A., Di Franco, F., Macaluso, R., and Santamaria, M. (2021). “17 - tio₂ in memristors and resistive random access memory devices,” in Titanium dioxide (TiO₂) and its applications (elsevier), metal oxides. Editors F. Parrino, and L. Palmisano, 507–526. doi:10.1016/B978-0-12-819960-2.00020-1

CrossRef Full Text | Google Scholar

Keywords: associative memory, content addressable memory, resistive RAM, RRAM-CMOS design, template matching

Citation: Foster P, Papandroulidakis G, Serb A, Stathopoulos S and Prodromakis T (2025) An RRAM-based implementation of a template matching circuit for low-power analogue classification. Front. Electron. 6:1568377. doi: 10.3389/felec.2025.1568377

Received: 29 January 2025; Accepted: 15 April 2025;
Published: 30 April 2025.

Edited by:

Graziella Scandurra, University of Messina, Italy

Reviewed by:

Baljit Kaur, National Institute of Technology Delhi, India
Leibin Ni, Huawei Technologies, China

Copyright © 2025 Foster, Papandroulidakis, Serb, Stathopoulos and Prodromakis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Patrick Foster, cGZvc3RlcjRAZWQuYWMudWs=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.