Statistical Analysis Based Feature Selection Enhanced RF-PUF With > 99.8% Accuracy on Unmodified Commodity Transmitters for IoT Physical Security

Due to the diverse and mobile nature of the deployment environment, smart commodity devices are vulnerable to various spoofing attacks which can allow a rogue device to get access to a large network. The vulnerability of the traditional digital signature-based authentication system lies in the fact that it uses only a key/pin, ignoring the device fingerprint. To circumvent the inherent weakness of the traditional system, various physical signature-based RF fingerprinting methods have been proposed in literature and RF-PUF is a promising choice among them. RF-PUF utilizes the inherent nonidealities of the traditional RF communication system as features at the receiver to uniquely identify a transmitter. It is resilient to key-hacking methods due to the absence of secret key requirements and does not require any additional circuitry on the transmitter end (no additional power, area, and computational burden). However, the concept of RF-PUF was proposed using MATLAB-generated data, which cannot ensure the presence of device entropy mapped to the system-level nonidealities. Hence, an experimental validation using commercial devices is necessary to prove its efficacy. In this work, for the first time, we analyze the effectiveness of RF-PUF on commodity devices, purchased off-the-shelf, without any modifications whatsoever. We have collected data from 30 Xbee S2C modules used as transmitters and released as a public dataset. A new feature has been engineered through PCA and statistical property analysis. With a new and robust feature set, it has been shown that 95% accuracy can be achieved using only ∼1.8 ms of test data fed into a neural network of 10 neurons in 1 layer, reaching > 99.8% accuracy with a network of higher model capacity, for the first time in literature without any assisting digital preamble. The design space has been explored in detail and the effect of the wireless channel has been investigated. The performance of some popular machine learning algorithms has been tested and compared with the neural network approach. A thorough investigation of various PUF properties has been done. With extensive testing of 41238000 cases, the detection probability for RF-PUF for our data is found to be 0.9987, which, for the first time, experimentally establishes RF-PUF as a strong authentication method. Finally, the potential attack models and the robustness of RF-PUF against them have been discussed.


INTRODUCTION
The fourth industrial revolution, powered by low-power, high-speed modern communication systems has ushered in a new era of immersive and unprecedented user experience through smart devices that are connected with each other and to the cloud, popularly known as the Internet of Things (IoT).The global IoT market is experiencing a rapid boost and according to a prediction by Norton, there will be around 21 billion connected devices by 2025 [1].We are already talking about the Internet of Everything (IoE) which essentially refers to people, data, and smart things connected to form an ecosystem that ensures a better and smarter lifestyle.The diverse application environment of the smart devices has rendered them vulnerable to a wide attacking surface.The weakest point in a network defines its security and for IoT networks, the resource-limited, user-end devices are the weakest points where a security compromise can provide access to a rogue device that can pose a massive threat to the whole network.So, the question of secure authentication before granting access to a large network is of increasing importance.
Traditional methods such as symmetric-key cryptography and asymmetric-key cryptography use secret private keys or public/private key pairs respectively, for encryption/decryption.Key-based methods require the storage of a secret key in a nonvolatile memory (NVM) or SRAM.However, they are vulnerable to different invasive/semi-invasive key-hacking attacks and side-channel attacks [2,3,4].Multi-factor authentication (MFA) [5,6] requires one or more verification factors (e.g., biometric factor, two-factor code from authentication app, etc.) along with the secret key.The widely-used open authentication (OAuth 2.0) protocol [7] for current IoT networks suffer from cross-site request forgery (CSRF) attacks [8,9].Both OAuth and MFA are inconvenient for large networks as they require manual verification.In addition to these vulnerabilities, the use of digital signatures also puts additional power and area burden which are typically small but could be significant for extremely energy and resource constraint edge devices.
To circumvent this, the idea of radio frequency physical unclonable function (RF-PUF) has been recently proposed [10] using physical signature instead of or in addition to the digital signature.The concept of RF-PUF is explained in Fig. 1.RF-PUF exploits the inherent device imperfections due to manufacturing process variation and other system-level nonidealities (e.g.LO frequency offset, I-Q mismatch, DC offset, attenuation, fading, Doppler shift, etc.) as unique physical signatures.These signatures are used as features and fed to a neural network at the receiver to train it.Once trained, this network can be employed at the receiver for authentication.RF-PUF does not demand any additional preamble, digital keys, or assistive communication medium for authentication purposes.The absence of an external security key or preamble makes RF-PUF highly resilient to different types of key-hacking attacks and alleviates the need for preamble obfuscation [11].Also, it does not require any secured memory block for key storage.Thus, both power and area overhead is reduced on the resource-constrained edge-node side of an asymmetric IoT network.
In [10], the idea of RF-PUF was presented primarily based on simulation data using I-Q samples as features.However, the PUF output is stochastic in nature and it is very hard to accurately capture the device nonidealities in simulation.This calls for addressing the open research needs of experimental validation of RF-PUF and demonstration of high-accuracy on devices found 'in-the-wild'.In this work, we address both these research problems by a) analyzing the efficacy of RF-PUF on unmodified commodity devices and b) introducing effective feature selection to increase RF-PUF accuracy > 99.8%.To achieve this, an improved and robust feature set is necessary to provide a reliable authentication method.We purchased commercially available 30 Xbee S2C devices and used them as unmodified commodity COTS (Components off-the-self) devices to experimentally validate RF-PUF.155.4 GB of data have been collected from the Xbee transceiver systems and 2.5 GB of data have been used for experimentation.This dataset has also been made public on GitHub along with this paper, for further development and validation by the RF-Security community.
It has been shown that 95% accuracy can be achieved even with a lightweight, single-layer neural network with 10 neurons and ∼ 1.8 ms (30 kB) of test data, which ensures the feasibility of RF-PUF in a low-latency network.With statistical analysis, a new feature has been augmented that massively boosts the performance of the network.The impact of the variation in neural network model capacity and the amount of training data on detection accuracy has been explored.Along with artificial neural networks, experiments have been performed with multiple traditional machine learning algorithms, and their performance is compared in terms of the number of devices.A detailed analysis of the PUF properties has been done to evaluate the eligibility of RF-PUF as a PUF.Inter-PUF and intra-PUF hamming distances have been calculated and it has been proved that for commodity COTS (Components off the self) devices without any modification, RF-PUF shows strong identifiability with a very high (99.87%)detection probability.As an authentication method, possible vulnerabilities and attack models for RF-PUF have been investigated and the robustness of RF-PUF against them has been proved.The insights gathered from these analyses and experiments may prove to be extremely important for the design and implementation of RF-PUF in the future in realistic application scenarios with 'in-the-wild' devices.

Our Contribution
In this work, through thorough statistical analysis of unmodified commodity devices we have found an optimum feature that improves the accuracy of RF-PUF significantly on a suite of commodity hardware devices leading to > 99.8% accuracy, along with PUF property analysis and security vulnerability analysis.Detailed contributions are as follows: 1) Feature engineering: Principal component analysis has been performed on the existing feature set found in the literature to find the dominant feature.Through moment analysis on the dominant feature (i.e. carrier frequency offset) we demonstrate that the addition of a feature called COV (ratio of standard deviation and mean of carrier frequency offset) significantly helps in achieving high (> 99.8%) accuracy (Section 4.3).
2) Highest accuracy achieved with unmodified COTS devices: 30 Xbee S2C modules have been used without the help of any assisting communication preamble or any modification to the devices whatsoever.Using data received over a wireless channel with a suitable feature set and a lightweight neural network, 99.8% accuracy can be achieved which, to our best knowledge, is the highest accuracy using this many commodity COTS devices considering the wireless channel (Section 4.4).
3) RF-PUF established as a strong PUF: Any distinct PUF class is identified through some properties that make it a separate class.They include constructability, evaluability, uniqueness, reliability, and identifiability.We have explored these properties for RF-PUF in detail, calculated intra-PUF and inter-PUF hamming distances and in an extensive test of 41238000 cases, we have shown that the probability of proper identification of an RF-PUF instance is 0.9987.This is the first time analysis of RF-PUF as a PUF class which experimentally demonstrates RF-PUF as a strong and unique PUF class by itself (Section 6).4) Performance evaluation using popular machine learning algorithms and comparison with neural network (NN) based approach.It has been shown that even a lightweight NN with a single hidden layer can handle >300 devices with 99.9% accuracy, unlike ML algorithms (Section 5.4).

5)
Wireless channel variability analysis on the accuracy of RF-PUF and the effect of network depth on accuracy with and without wireless channel has been presented.Discussion on possible important attack models and the robustness of RF-PUF against such attacks (Section 5.5).6) Public Dataset: Our collected data have been released as a public dataset for the whole community to explore and experiment with (Section 3.3).

RELATED WORKS
Time and frequency domain properties of individual transmitters have been used for RF fingerprinting [12,13,14,15,16,17,18,19,20].However, both time and frequency domain analysis have their limitations in the form of detecting the start and end of the transients, high oversampling ratios, and the need for fixed preambles to avoid data dependency.MAC layer and other upper layers of the communication protocol Figure 3. Grouping collected data in a number of frames and filtering of data for acceptable frames.This step is required as the Xbee module transmits data on an interval.
have also been used for RF-fingerprinting [21].However, device identifiers in upper layers like IMEI number, IP address, MAC address, etc. can be spoofed [22,23,24,25].Hanna et al. utilized power amplifier nonlinearity to fingerprint RF devices [26] using simulation data.Recently, there have been a growing number of works that use raw RF data and depend on complex neural networks to classify devices [27,28,29,30].This method has one weak point.As wireless data are contaminated with noise and interference, any use of the RF data without processing always posits a risk of huge performance drop in scenarios where environmental nonidealities can go beyond the estimation that was used while designing the network.Contaminated data can render faulty predictions, especially if the training environment is significantly different from the test environment.Another concern in this approach is that it is somewhat blind and does not provide intuition on different design parameters and their effects.Processing data, extracting a proper feature set, and unraveling the mystery of the design space can render a robust authentication method that is less vulnerable to environmental factors and provides more flexibility to the designer.That's why RF-PUF performs better than the CNN-based approach as is shown in [31].Brik et al. [32] used IEEE 802.11 devices to show 99% detection accuracy.However, the wireless channel was ignored completely.In our work, using data from 30 commercial Xbee devices and considering the wireless channel in conjunction with a lightweight NN, we have shown that we can achieve > 99.8% accuracy.

Physical Device Setup
For experimental validation, 30 XBee S2C modules are chosen (IEEE 802.15.4 standard) which is designed for industrial and commercial use.Using SMA cable, a HackRF One software-defined radio (SDR) module has been connected either to the TX (case 1) or to the RX (case 2) to extract data excluding (case 1) or including (case 2) wireless channel.

Data Collection and Filtering Noise
A 31-bit pseudo-random bit sequence (PRBS) is generated in MATLAB and fed to each TX which transmits this data for 60 sec with QPSK modulation at 2.465 GHz and 230400 bps baud rate.These data were captured in a Xbee RX module.Simultaneously, data were also captured by a HackRF one software-defined radio (SDR) module, sampled at 6 MSps, and stored by GNU Radio.The captured data are divided into several frames, each containing a number of samples.From the constellation diagram of the frame data (Fig. 3), it is found that some frames have no significant data points and contain only noise as the Xbee devices transmit data intermittently due to its buffer limitation.These blank frames containing only noise were discarded.

Public Dataset
This dataset contains raw data collected from 30 Xbee S2C transmitters for both cases (excluding and including the channel) in binary format.The total size of the dataset is 155.4GB (each transmitter data is ∼2.5 GB).It can be downloaded from 'Sparclab RF-PUF Dataset [33]'.

Initial Feature Set
In our work, CFO and I-Q data are taken as features just as in the original RF-PUF paper [10].The previously generated frames are filtered using matched filtering, frequency compensated (both fine and coarse), and finally synchronized using timing recovery.In this process, CFO is found as a byproduct.Along with CFO, the compensated in-phase and quadrature-phase components in four quadrants are used as features.The 9 features (CFO + 4 I-components + 4 Q-components) from each frame and 1000 frames from each TX lead to a feature set of 9 × 1000.The final feature matrix is a combination of these feature sets from all 30 devices and has a size of 9 × 30000.

Accuracy with Carrier Frequency Offset and I-Q Features
The whole feature data are divided into 70%, 15%, and 15% respectively for training, validation, and test purposes and fed into a neural network (NN).The performance of the neural network is tested by varying the number of neurons and hidden layers.Fig. 4(a) shows the accuracy of the trained model for different neural networks.The accuracy is less than 75% in all test cases.Since exploring different NN configurations does not provide expected accuracy, our choice here is to: (i) form an improved feature set to be used with the NN (ii) use different machine learning (ML) algorithms (iii) use more data.We first search for an improved feature set for better accuracy.Later, effect of more data is shown in subsection 5.1, 5.2 and a comparison of different ML algorithm and NN is discussed in subsection 5.4.

Principal Component Analysis
We start the investigation by performing Principal Component Analysis (PCA) with feature matrix as input (each feature represents one input dimension).Fig. 4(b) shows the principal components and their contribution to the variances.The first principal component (PC) contributes to most of the variances and the input to PC mapping reveals that the CFO is the most dominant feature.So, an in-depth statistical property analysis of the CFO can help in deriving a new feature.

Moment Analysis
Since, CFO varies from frame to frame (i.e., with time), it is intuitive to look at the moments of their distribution.Specifically, we want to look at first and second-order moments (mean and variance).Fig. 4(c) shows the absolute values of mean and standard deviation (square root of variance) of CFO.These parameters vary significantly from TX to TX in most cases.And even if for any two TX, the mean is similar, the standard deviation is different, and vice versa.If they can be combined to form a new feature, that can provide significant discrimination among transmitters and lead to much better accuracy.In statistics, the ratio of standard deviation and mean is known as the coefficient of variation.So, using this statistical parameter, we form a new feature named the coefficient of frequency offset variation (COV) which is COV is included as the tenth feature in our existing feature matrix.From PCA analysis, it is already revealed that the I-Q features contribute to much fewer variances and can be discarded by trading some accuracy.Since our goal is to achieve maximum possible accuracy, we still keep them as features.Also, I-Q values contain channel information, which will help the NN to compensate for the wireless channel (channel effect is explained in subsection 5.5).
After including COV as the tenth feature, our neural network was trained, validated, and tested again with the new feature matrix.Fig. 4(d) shows that the performance of the network has improved drastically.With just a single hidden layer, > 95% accuracy can be achieved using 10 neurons and can hit 99.9% accuracy by increasing the number of neurons.

Effect of number of samples
Fig. 5(a) shows the plot of detection accuracy versus the number of samples in each frame for different neural networks.The general trend is that, for each NN configuration, detection accuracy improves with the increase in the number of samples (along the x-axis).This is expected because a higher number of samples provide more information and hence better performance.Also, > 95% accuracy point is reached at around 150 samples per frame which is equivalent to 12.5 ms of total data (or 1.8 ms test data).Hence, we can reach the 95% accuracy bar using small test data.The accuracy of simple ML networks drops when the number of TX is large, wherein the neural network still holds up with > 99.9% accuracy.

Effect of the Number of Frames in Feature Set
Fig. 5(b) shows accuracy for two different frame numbers, 500 and 1000.With a higher frame number, the information content of each transmitter device increases and so their detection gets better as shown by the blue (1000 frames) and brown lines (500 frames) respectively.We can generalize the previous subsection (sample number effect) and this subsection as more data render better performance.

Effect of the Neural Network Parameters
Fig. 5(c) shows the plots of accuracy versus the number of neurons in each hidden layer.As the number of neurons increases along the x-axis, accuracy, in general, gets better.Also, as the number of hidden layers increases, the network performs better initially, but later it creates an overfitting problem where the model capacity is too large compared to data.This phenomenon directly manifests itself as a degradation in performance.Hence, there is an optimum model capacity up to which accuracy increases, and beyond that accuracy drops.

Using Simple Machine Learning (ML) Algorithm
It has been observed that the COV values vary significantly among different transmitters.When a simple feature displays a significant separation among different classes, it can be modeled with a complex if-else ladder structure.This implies that even simple ML algorithms (e.g.Tree) can show good results.Fig. 6(a) shows that some popular ML algorithm achieves > 95% accuracy.
The true power of the neural network comes into play when the number of TX increases as shown in Fig. 6(b).For this, features are generated for 300 TX devices following a Gaussian distribution (as in [10]) with the same mean and variance as that of the original 30 TX devices, for both inter and intra-class variations.Fig. 6(b) shows that as the number of TX increases, accuracy falls after a certain point (∼ 100 TX) even for support vector machines (SVM), and it fails to converge for > 150 TX.

Enough for TX, but not channel variation
Enough for both TX and channel variation Figure 7.Comparison of the network performance in the cases of including and excluding the wireless channel data.The network needs 15 neurons compared to 10 neurons in a hidden layer to achieve 95% accuracy for the case where the channel is considered.But with higher model capacity, both lines converge and the network learns the channel effect on data.d the network learns the channel effect on data.The light red box shows the region where the network fails to learn transmitter variation, light yellow box shows the region where the network learns transmitter variation but fails to learn the variation due to the wireless channel.The light green box shows the region where the network learns both the transmitter and channel variation properly.

Effect of Wireless Channel
So far, nonidealities due to TX were considered and the wireless channel was ignored (TX, RX connected via SMA cable).But the channel itself adds some nonidealities.Here, the effect of a static wireless channel (1 m of fixed TX-RX separation) has been analyzed.Fig. 7 shows accuracy versus neuron number in a single layer, with and without the wireless channel.For iso-accuracy of 95%, wireless channel demands slightly higher model capacity (10 vs 15 neurons).But when the number of neurons increases (> 50), both curves merge and render similar accuracy.
In one of our recent works [31], we applied RF-PUF on the ORACLE dataset which contains data for 16 USRP X310 TX for both static and quasi-static (variable TX-RX separation) cases with a channel length varying from 2ft to 62ft.We have shown that RF-PUF achieves 100% accuracy up to 38ft and > 95% accuracy even at 62ft channel length.This result confirms that the RF-PUF approach can make the channel compensation with the help of NN and render high performance even in a long wireless channel.On a side note, that work combined with current work, also confirms that RF-PUF achieves high accuracy on experimental data in different platforms (XBee vs USRP radios using WiFi) for different devices.

ANALYSIS OF PUF PROPERTIES
PUF response to a particular challenge is a probabilistic function.In this section, we will determine intra-PUF hamming distance and inter-PUF hamming distance and discuss various PUF properties ([? ?]) in light of those distances.

Constructability
A PUF class P is constructible if we can create a new PUF instance puf m ∈ P through a process, P.Create : puf m ← P.Create, where puf m has entropy that makes it distinct from other PUF instances puf n,n =m .In the case of RF-PUF, the source of entropy is the manufacturing process variation.During fabrication of ICs, we have within die and die-to-die variation which is due to the limitation of the manufacturing process.In contrast to many other PUF classes where we need a separate mechanism for PUF instance creation, the manufacturing process of the integrated circuit itself serves as the creation process for RF-PUF which is one of its advantages.

Evaluability
A PUF class P is evaluable if for a random PUF instance puf m ∈ P and a random challenge (x), we can evaluate a response y : y ← puf m (x).In our case, the challenge is a randomly generated bitstream in MATLAB that is fed into the transmitter and the corresponding response is the analog signal that contains the unique physical signature of the transmitter.

Inter PUF Distance -Uniqueness
Uniqueness refers to how different each instance of a PUF class P is from each other.A measurement metric that is used to represent PUF uniqueness is called inter-PUF hamming distance and is defined as: Here, Y α m (x) and Y α n (x) are the responses from puf m and puf n (two instances of PUF class P) under same environmental condition α and same challenge x.Ideally, these inter-chip hamming distances should be much greater than any intra-chip hamming distances to distinguish them separately.In our experiment, our PUF class P = RF-PUF and puf i , (where i = 1, 2, ...30) are the instances of that class (30 Xbee devices).
To calculate HD inter , the first 1000 frames from each of the transmitters are taken.Each frame contains 3600 samples.Our features remain unchanged: CFO, eight I-Q component values, and COV.But after taking 10 features from each of 1000 frames, instead of using them as a feature matrix for each transmitter, the mean values of the features are taken across all the frames.This means that instead of representing each transmitter as a 10 × 1000 feature matrix, it is represented as a 10 × 1 feature vector.The reason for taking the average value across the frames is that the frames have an associated timestamp with them i.e., each frame data are collected from time to time.So, each frame faces slightly different environmental conditions such as heating of the transmitter due to data transmission for a long time, external interference, noise, etc. Averaging the feature values across a large number of frames mitigates the environmental factors, especially noise.Also, taking the first 1000 frames from each transmitter ensures the same initial heating pattern across devices.So the final outcome is that the feature vector for each transmitter has a very similar environmental factor α, which is one of the conditions of inter-chip hamming distance calculation.
After taking feature vector from each transmitter, the Euclidean distance was calculated in ten dimensional feature space as hamming distance.For puf m , let us denote CF O m = carrier frequency offset, COV m = coefficient of frequency offset variation, I k,m = in-phase component in the k − th quadrant and Q k,m = quadrature-phase component in the k − th quadrant.Then distance d m,n between puf m and puf n instances is given by: The inter-chip distances were calculated for each transmitter with respect to all 30 transmitters (including the chip under test), which leads to a 30 × 30 symmetric matrix (upper and lower triangular matrices with same values since d m,n = d n,m = inter-chip distance between puf m and puf n ) with a principal diagonal of zeros (self-distance).It is found that the worst case scenario with minimum distance, HD inter,min = 0.2307 and the best case scenario with maximum distance, HD inter,max = 10.149.In literature, often a mean inter-puf distance, µ inter , is reported which is the average of all HD inter .The formula is:

Distribution of intra
Where N puf is number of puf instances (N puf = 30 for us), and N chal is the number of challenges (N chal = 1, since we are not varying our challenge).Using this formula, we find that µ inter = 3.703.
Fig. 8(c) shows the probability mass function distribution of 435 (= 30×29 2 ) inter-PUF distances.The density function is right-skewed, that's why Weibull fitting (which is exponential in nature) fits it more accurately than normal distribution fitting.This fitting shows that on the right side the curve is more sparse but on the left side it is more centered instead of being sparse, which is good because that will ensure that the inter-PUF values don't go to overlap intra-PUF distances which should ideally be at zero.

Intra PUF Distance -Reliability
PUF responses are in general dependent on various environmental factors that render any PUF instance response as a probabilistic function.This means that a particular PUF instance can provide slightly different values of features based on varying environmental conditions.For authentication purposes, this poses an issue.Reliability refers to how resilient a PUF instance is against environmental factors e.g.noise, interference, temperature, supply voltage, etc.
A measurement metric that is used to represent how reliable a particular instance of a PUF class P is intra-puf hamming distance and is defined as: Here, Y α m (x) and Y β m (x) are the responses from puf m under two distinct environmental conditions α and β and same challenge x.Many HD intra distances are calculated at different environmental conditions.Ideally this intra-chip hamming distances should be zero.
To calculate HD intra , we follow two steps.Let us consider one particular PUF instance puf m .On step 1, the first 1000 frames (frame number 1 to frame number 1000) were taken from puf m , each frame containing 3600 samples.Then mean values of the previously mentioned ten features were taken just as before to represent it as a 10 × 1 feature vector.Let us represent this vector as f v,1 .Then on step 2, first 5 frames are skipped and the next 1000 frames are taken from frame number = 6 to frame number = 1005.
Step 1 is repeated here to get next feature vector f v,2 .Then next 1000 frames are taken from frame number = 11 to frame number = 1010 and a feature vector f v,3 is formed.This process is repeated 80 times to 80 different feature vectors f v,α ; α = 1, 2, ..., 80.These 10 × 1 feature vectors are stacked together to form a feature vector set f set,m of size 10 × 80 for puf m .The whole process is then repeated for all 30 devices.
The purpose of taking frame-shifted or time-shifted frame groups is to consider the time factor.Each frame has a duration of 0.6ms, so 5 frames gap in between two frame groups renders a time difference of at least 3ms (in reality the difference is much larger since the transmitter transmits data for a small time and most of the frames are just noise which are filtered in data pre-processing step).The 80 time-spaced frames, in reality, cover almost half a minute.Our 2.4GHz clock will have LO drift cycle time in the nanoseconds range.Hence, half-minute data can incorporate significant environmental factors into frame data.So, it can be assumed that the feature vectors f v,α ; α = 1, 2, ..., 80 in feature vector set f set,m of puf m represents α = 80 different environmental conditions.Now, for each instance puf m , Euclidean distance is calculated in 10 dimensional feature space among the feature vectors in the feature vector set using Eq. 1.This results in a symmetric matrix of size 80 × 80 with a principal diagonal of zeros.This process is repeated for other transmitters as well.Essentially it gives us 30 matrices of size 80 × 80 for intra-PUF distances.In the best-case scenario, the minimum distance is HD intra,min = 7.23 × 10 −5 and in the worst case scenario, the maximum distance is HD intra,max = 0.73. ) intra-PUF distances.The density function is right-skewed and Weibull distribution gives better fitting for it just like inter-PUF cases.This fitting shows that on the left side the curve is strongly centered towards zero, but has a diminishing trail on the right.this trail goes on to overlap inter-puf distances slightly and causes a few detection errors.Detection probability is discussed in the next subsection.
Finally, a mean intra-PUF distance, µ intra , is calculated which is the average of all HD intra .The formula is: HD intra Where N puf is number of puf instances (N puf = 30 for us), N chal is the number of challenges (N chal = 1, since we are not varying our challenge) and α is number of environmental conditions (α = 80 in our study).
Using this formula, it is found that µ intra = 0.136.

Identifiability
In the previous two subsections, both inter-PUF and intra-PUF hamming distances and their mean values: µ inter = 3.703 and µ intra = 0.136 are calculated.Their comparison shows that µ inter > µ intra , which establishes that on average the PUF instances can be distinguished from each other.But the mean value does not depict the full story.Fig. 8(b) shows the fitted distribution curves superimposed on each other.The brown curve (intra-PUF distribution) is skewed to the left and the blue curve (inter-PUF distribution) is skewed to the right and they mostly cover different regions.However, there is slight overlapping between them which is shown in the inset as a zoomed version of the overlapping area.Ideally, there should be no overlapping.But in a practical scenario, this overlapping region is the source of detection error.
From the definition of identifiability, a PUF class P is identifiable if it is reliable as well as unique, and if the probability of inter-PUF variation being greater than intra-PUF variation is very high.Mathematically: P robability(HD inter > HD intra ) ≈ 1 In previous two subsections, 94800 (= 30×80×79 2 ) intra-PUF distances and 435 (= 30×29 2 inter-PUF distances have been calculated.Now, each of these inter-puf distances is compared with each of the intra-PUF distances that leads us to 435 × 94800 = 41238000 cases, among which, HD inter > HD intra is found in 41184206 cases.P robability(HD inter > HD intra ) = 0.9987 This is a very high probability and close to 1.This proves that RF-PUF has strong identifiability and this property along with reliability, uniqueness, constructability, and evaluability manifests RF-PUF as a distinct PUF class.This is the first-ever experimental validation of RF-PUF as a distinct and strong PUF class by itself.

POSSIBLE ATTACK MODELS ON RF-PUF
RF-PUF does not store any digital key and hence, is not susceptible to malicious PUF models which assume that the adversary can have access to all the challenge-response pairs through a built-in logger software/implanted Trojan.However, there is a possibility of a machine learning-based attack that needs to be discussed (Fig. 10).For RF-PUF, ML attack is a two-step process: • Step 1: model/profile the victim TX (Unsupervised) • Step 2: use that model for spoofing/replay attacks In step 1, the rogue device tries to learn the feature/parameter values of the victim TX.Unlike the intended RX, this is an unsupervised problem for the attacker.We have utilized k-means clustering to divide the feature map into 30 clusters and compare the predicted and true labels (Fig. 9).The process was repeated 1000 times as k-means isn't unique without specific conditions.Our analysis shows that clustering achieves ∼ 3.63% accuracy on average, which is very close to the probability of random detection ( 1 30 = 3.3%).So, practically it is almost impossible to get the right feature value and label.
If somehow the attacker succeeds in step 1, then in step 2, the attacker needs to produce an RF signal that contains the same imperfections as the victim TX with high accuracy.This requires a high speed and high-resolution circuitry.Fig. 10 shows that the physical signature of the transmitter, S, goes through transformation T T X at TX and T RX at RX.The transformations in the attacker are T A , T M L1 , T M L2 , and T D respectively.Full transformation for the original device is T RX (T T X (S)) and for the adversary is T RX (T D (T M L2 (T M L1 (T A (T T X (S)))))).The adversary ML2 framework needs to make these two transformations equal by undoing the effect of its ADC/DAC which requires almost infinite resolution, rendering it practically impossible (typical ADC/DAC are 8/16-bit).This Resolution limitation in ADC/DAC and bandwidth limitation in filters and other RF components also prevent replay attack, which requires the attacker to convert the TX signal in the digital domain, incorporate malicious contents and then transform it back into the RF domain with very high precision.Further analysis of precision requirements for a practical attack will be included in future work.The robustness of RF-PUF against

Figure 2 .
Figure 2. (a) Commodity off-the-shelf devices (30 Xbee S2C modules) used for data collection.(b) Conceptual experimental setup.(c) Actual experimental setup in the lab.The TX and RX are placed 1 m apart (they are close here for image capturing purpose only) and a HackRF module was used to collect data either from the TX (case 1) or RX (case 2).GNU Radio records the collected data and shows a live constellation (visible on-screen).The rotating constellation is later processed in MATLAB through coarse and fine frequency compensation.

Fig. 2 (
a) shows the Xbee devices whereas Fig. 2(b), and Fig. 2(c) show the block diagram and the actual setup.The TX and RX are kept 1 m apart.

Figure 4 .
Figure 4. (a) Accuracy vs the number of neurons in each hidden layer.Even after increasing the number of hidden layers, the accuracy remains < 75%.(b) Principal Component Analysis (PCA) reveals that first principal component (PC) causes most variation, which in turn depends mostly on the carrier frequency offset, CFO.(c) Mean (µ) and standard deviation (σ) of the dominant feature (CFO) analyzed in search of a new feature.It reveals that these statistical parameters vary significantly among transmitters.So, their ratio or coefficient of frequency offset variation, COV = standard deviation (σ) of CFO mean (µ) of CFO is taken as the tenth feature.(d) Inclusion of COV shows significant improvement in the detection accuracy.Using a single hidden layer with only 10 neurons, 95% accuracy is achieved, and > 99% accuracy is reached for > 50 neurons.

Figure 5 .
Figure 5. (a) Detection accuracy vs the number of samples per frame for different neural networks which shows a trend of accuracy improvement (indicated by red arrow) with the increase in sample number.(b) Detection accuracy for different frame numbers shows higher frame number renders better performance.(c)Detection accuracy versus the number of neurons per layer.The general trend shows that accuracy improves with the increase in the number of neurons in each layer.Also, accuracy typically improves with more hidden layers until the higher model order causes overfitting and degrades performance.

Figure 6 .
Figure 6.(a) Popular ML algorithms show high accuracy for 30 Xbee devices.(b) The accuracy of simple ML networks drops when the number of TX is large, wherein the neural network still holds up with > 99.9% accuracy.

Figure 8 .
Figure 8. Data distribution of (a) intra-PUF hamming distances and (c) inter-PUF hamming distances.Due to skewness, Weibull distribution fitting is a more accurate representation in these cases.(b) The two Weibull curves are superimposed on top of each other.It is seen that there is a very slight overlap (yellow region) between the curves which is shown in a zoomed inset.Although trivial, this overlapping is the source of the detection error.

Figure 9 .
Figure 9. Heatmap of unsupervised learning in the attacker using k-means clustering.(a) The worst case accuracy of 0.09% and (b) the best case accuracy of 6.8%.Repeated clustering for 1000 times shows 3.63% accuracy on average.