- 1Department of Software Engineering, Faculty of Engineering, Koya University, Koy Sanjaq, Iraq
- 2Department of Research Engineering Center, Deanship of R&D Centers, Koya University, Koy Sanjaq, Iraq
1 Introduction
The widespread integration of inertial sensors in smartphones has encouraged on considerable research interest in gait recognition and human identification. Nonetheless, early foundational studies established methodologies under unrealistic experimental testbed using wearable sensors, not enough participants to construct dataset, and under highly controlled conditions, such as fixing inertial sensors on specific body positions. For example, several studies utilized wearable sensors such as inertial measurement unit (IMU), which are not applicable for human recognition. Mondal et al. (2012) developed Intelligent Gait Oscillation Detector (IGOD) model via wearable sensors with eight rotation sensors recording locomotion oscillations from 30 participants/individuals. However, the model requires full individual cooperation, which is unsuitable for natural daily solutions. Furthermore, Deb et al. (2020) examined wrist, and ankle placements across 50 individuals, revealing strong identification potential but remaining constrained by fixed placement requirements. In addition, Dehzangi et al. (2017) deployed five synchronized IMUs at multiple body positions on 10 participants, though requiring extensive instrumentation that diverged from natural smartphone handling. Semwal et al. (2022) employing 25 individuals with IMU or wearable sensors on six body joints capturing six walking styles.
On the other hand, most of the recent studies provide human gait recognition via constructing dataset within few number of the participants. For instance, Andersson et al. (2024) focused on hip-joint angle estimation using smartphone sensors' measurements from 10 individuals on a treadmill at 4 km/h, providing biomechanical insights but failing to replicate daily walking variability. Also, Cao et al. (2022) developed a hybrid network combining convolutional and recurrent layers for authentication using accelerometer and gyroscope data from 40 participants, though requiring fixed right-front-pocket placement. Kim et al. (2025) introduced “GaitX” for real-time distracted-walking detection with 21 participants under handheld and pocketed positions. Raziff et al. (2017) tested three handheld positions with 30 participants when devices are held against the abdomen. Shi et al. (2023) integrated 1D-CNN with bidirectional LSTM across three datasets (18–30 subjects) requiring fixed sensor placements at waist, belt, or multiple body positions. Filippou et al. (2023) developed StepMatchDTWBA for wrist-worn accelerometers on 30 volunteers under controlled laboratory conditions. Hoang et al. (2015) addressed orientation instability in front-pocket placement by transforming accelerometer data to Earth coordinates with 38 individuals, requiring concurrent orientation sensor data at low sampling rates (27 Hz). Multi-sensor fusion and validation approaches have further explored positioning diversity. Shahar and Agmon (2021) validated smartphone front-pocket placement against APDM mobility lab with 60 adults, revealing asymmetric recognition between phone-side and contralateral legs. Rafiq et al. (2025) investigated controlled thigh-mounted positioning across two datasets (Extrasensory: 60 participants; KU-HAR: 90 participants), though requiring fixed positioning that limited natural usage.
The other issue of the available human gait recognition is that the testbed and the style of holding smartphones are controlled. For example, Li et al. (2025) examined pants-pocket placement with six participants for real-time recognition of seven activities, augmenting 1D-CNN with time-domain features at 1.5-s intervals. Moreover, a new study Degbey et al. (2024) efforts to include a large-scale of 173 participants using waistband-pocketed phones during straight-line walking. Again, Al-Mahadeen et al. (2023) utilizes 100 participants from the HMOG dataset holding phones while typing across eight sessions, demonstrating diverse approaches from pocket-mounted to handheld configurations at 100 Hz sampling rates.
Despite advancing modeling techniques, these studies remain fundamentally limited by modest sample sizes (6–173 subjects) and prescribed carrying positions (waist, pocket, belt, or handheld) within wearable or IMU sensors. Furthermore, these studies are faced controlled laboratory environments, restricting ecological validity and ability to represent natural smartphone usage patterns. To address these limitations, this data article presents a large-scale smartphone-based gait dataset encompassing 390 individuals/participants walking under unconstrained conditions, and with real-world scenarios without fixing on a specific participant's body positions. The dataset utilizes built-in smartphone accelerometer, gyroscope, and magnetometer sensors. Data were collected as individuals walked short distances while naturally holding the smartphone without prescribed placement constraints. This methodology captures authentic walking behavior and sensor variability representative of everyday smartphone usage, providing a robust foundation for developing gait recognition systems that reflect real-world conditions.
2 Data description
2.1 Dataset overview
This dataset comprises synchronized triaxial inertial sensors data from 390 participants (61% male, 39% female), with each participant completing 10 independent walking trials. The range of the age of the participants started from 18 to 51 years old. Furthermore, the participants are students and faculty from Koya University. Thus, the dataset provides 3,900 individual recordings organized hierarchically by participants. This is followed by, that each record contains nine-dimensional measurements from accelerometer (ACC_X, ACC_Y, and ACC_Z), gyroscope (GY_X, GY_Y, and GY_Z), and magnetometer (MAG_X, MAG_Y, and MAG_Z). The inclusion of magnetometer data extends beyond conventional accelerometer-gyroscope configurations, enabling multi-sensor fusion for enhanced gait characterization (Cao et al., 2022). Individual sequences contain approximately 250–400 temporal samples per axis, capturing complete walking episodes at 30 Hz. Figure 1 illustrates representative sensor signal patterns and value distributions across all three axes for each sensor modality, demonstrating the characteristic temporal dynamics and statistical properties of naturalistic gait data.
2.2 Data acquisition hardware and sensors
The dataset is collected via using a Samsung Galaxy A53 smartphone (Samsung Electronics, Suwon, South Korea) as the sensing platform. The embedded smartphone inertial sensors are configured to sample all the three sensors modalities synchronously at 30 Hz throughout each walking trial. A custom Android application provided intuitive start/stop controls, allowing participants to independently initiate and terminate recordings for each trial sequence. The sampling frequency of 30 Hz is selected to adequately capture human gait dynamics while maintaining manageable data volumes and battery consumption during extended data collection sessions. This sampling rate has been demonstrated as sufficient for capturing fundamental gait characteristics and temporal patterns in most of the current applications including ambulatory motion.
2.3 Experimental protocol
Each participant completed 10 walking trials along a straight 12-m pathway, totaling 120 m per participant and 46.8 km across all 390 participants. To maximize ecological validity, participants walked at their natural, self-selected step while holding the smartphone in their dominant hand without constraints on device orientation or grip position. Thus, such a protocol makes the dataset more realistic in comparison of the current studies. This is because the current studies fixed the smartphones or wearable IMU sensors in the fixed positions of the participants' bodies. This unconstrained approach, contrasting with rigid device fixation used in previous studies (Kim et al., 2025; Raziff et al., 2017), reflects real-world usage scenarios where fixing device orientation is impractical (Deb et al., 2020). The naturalistic protocol captures authentic variability in smartphone handling and orientation, enhancing the external validity and generalizability of models trained on this dataset.
2.4 Data collection period and data source location
Dataset acquisition spanned four months (September 2024–January 2025) of extensive work encompassing application development, infrastructure establishment, and systematic recruitment of all 390 participants from the Koya University, Iraq. Participants comprised predominantly students with additional university faculty. Data collection occurred at two standardized indoor locations: the ground-floor corridor of the Deanship of Research and Development Centers Building and the second-floor hall of the Faculty of Physical Education Library. These controlled environments ensured consistent ambient conditions while maintaining ecological validity through naturalistic device handling and gait patterns. Note, the controlled environments mean that all participants are equally walked in a period of distance as well as they held smartphone within the dominant hand. Holding smartphones while walking is common during active use (texting, navigation, checking apps), and our approach captures this real-world scenario better than existing datasets that strap devices to fixed body positions like waists or belts. We intentionally allowed natural grip variations without prescribed orientations to reflect authentic handling patterns. This design enables practical applications including continuous authentication during app use, fall detection, and mobility monitoring while users actively hold their devices. We acknowledge this approach does not represent phones kept in pockets or bags.
2.5 Dataset applications
This dataset supports multiple research domains: (1) biometric identification and continuous authentication using gait-based person recognition, (2) gait analysis including stride dynamics and walking kinematics across diverse populations, and (3) human activity recognition and computational modeling of locomotion behavior. This realistic or unconstrained data collection approach enhances applicability for real-world deployment scenarios where rigid device positioning is impractical. Thus, making such resource particularly valuable for behavioral biometrics, mobile health monitoring, and deep learning applications requiring ecologically valid training data.
2.6 Value of the dataset
This study presents a novel dataset for human gait identification using smartphone sensors. The main characteristics of the dataset are as follows:
1- The dataset provides unrestricted access to comprehensive nine-channel of inertial sensors measurements including accelerometer, gyroscope, and magnetometer.
2- Unlike existing datasets with constrained smartphone positioning, this dataset captures natural handheld usage patterns. Thus, it is offering researchers authentic data for developing real-world applicable biometric solutions.
3- Last but not least, the dataset is considered as a large dataset for human gait identification using smartphone inertial sensors. This is due to the participation of 390 individuals within 46.8 km across all the participants.
3 Data processing
3.1 Raw data loading and magnitude calculation
The pre-processing steps are systematically processing the 3,900 CSV files, which are containing the raw inertial sensors measurements. Each file represents a single walking trial with nine-channel time-series data sampled at 30 Hz. The workflow transforms three-dimensional sensor vectors into scalar magnitude values representing overall motion intensity. For each sensor modality (accelerometer, gyroscope, and magnetometer), magnitude vectors are computed using the Euclidean norm, as expressed in Equation 1:
where M denotes the computed norm value of the three-dimensional sensor measurements, and x, y, and z represent the measurements from the respective sensor axes. This transformation captures total acceleration, angular velocity, and magnetic field strength independent of smartphone orientation. Thus, it is providing robust gait representations which are less sensitive to smartphone positioning variations. The magnitude calculation function performs three operations: (1) loads each CSV file, (2) verifies data integrity across all required sensor channels (accx, accy, accz, gyx, gyy, gyz, magx, magy, and magz), and (3) generates three magnitude signals per trial, one for each sensor modality.
3.2 Filters and selection criteria for gait cycle extraction
Thereafter, following magnitude computation, the data preprocessing is implemented to attenuate high-frequency noise while preserving gait-relevant motion characteristics. A fourth-order Butterworth low-pass filter with a cutoff frequency of 10 Hz is applied to the three sensors magnitude. Prior to filtering, edge effects are mitigated by systematically trimming 30 samples from the beginning and 60 samples from the end of each trial sequence. Peak detection is performed on the filtered accelerometer magnitude to identify local maxima corresponding to heel strike events during walking. The detection algorithm employed three constraints: minimum inter-peak distance of 10 samples, minimum peak height of 10 m/s, and minimum prominence of 2.5 m/s. Subsequently, the detected peaks are further refined by enforcing an inter-peak (contiguous peaks) distance constraint of no more than 22 samples. This is ensuring physiologically reasonable step intervals while eliminating spurious detections. This is followed by the segmentation of gait cycles by identifying triplets of consecutive refined peaks, where each peak represents a heel strike event. Given that each step corresponds to half of a complete stride (gait cycle). The three consecutive peaks define one full gait cycle, where the first and third peaks represent successive heel strikes of the same foot, while the intermediate peak captures the contra lateral foot's heel strike. Thus, the temporal span between the first and third peaks constitutes one complete stride cycle. To ensure biomechanical validity, stringent quality selection criteria are enforced: only cycles with durations between 28 and 44 samples (corresponding to 0.93–1.47 s at the 30 Hz sampling rate) are retained.
3.3 Temporal normalization and padding selection
Variability in gait cycle lengths arises from differences in individual walking speeds and stride patterns, necessitating temporal normalization for standardized feature extraction and classification. Temporal normalization is performed by computing the local median length for each participant/class as the target standardization length. Twelve padding techniques are evaluated including constant, edge replication, linear ramp, maximum/minimum extension, mean, reflection, wrapping, symmetric, median, cubic spline interpolation, and Piecewise Aggregate Approximation (PAA). The effectiveness of these techniques is assessed using minimum Euclidean distance measurements to evaluate preservation of gait similarity patterns. The PAA demonstrated superior performance as well as provides better preserving temporal relationships. The PAA technique is therefore implemented at both local (per-class) and global (cross-class) levels following established frameworks (Lin et al., 2003). This is serving as the optimal technique for gait cycle standardization prior to feature extraction.
4 Statistical properties of sensor measurements
To understand the typical patterns observed across our 3,900-file dataset, we examined the descriptive statistics from a representative recording containing 338 samples, as summarized in Table 1 . The accelerometer readings along the Z-axis averaged 9.17 m/s2 with a standard deviation of 2.17, capturing both the gravitational component and the vertical oscillations characteristic of human gait.
The magnetometer data revealed a strong Z-axis component, averaging −47.32 μT with a standard deviation of 7.68, consistent with what we would expect from the Earth's magnetic field. The overall magnitude of these readings ranged between 38.43 and 63.52 μT. Meanwhile, the gyroscope measurements showed near-zero mean values across all axes (with absolute means below 0.03 rad/s), suggesting minimal rotational bias in the sensor. During natural walking, gyroscope magnitudes typically stayed below 2.3 rad/s.
One particularly useful aspect of our feature set is the magnitude calculations, which offer rotation-invariant representations of the sensor signals. These features preserve the essential signal intensity while reducing dimensionality from three dimensions down to one. This approach gives us the flexibility to characterize gait using both orientation-dependent features (the individual axis values) and orientation-invariant features (the computed magnitudes), ultimately supporting more robust biometric identification regardless of how the device is positioned.
5 Feature extractions
A set of 57 features is extracted from combined gait cycles across 10 trial sessions to characterize individual walking patterns for biometric identification. These features are systematically derived from accelerometer, gyroscope, and magnetometer magnitude signals using a synchronized multi-sensor approach. Gait cycles are identified through peak detection on the accelerometer magnitude signal, with each cycle defined from one peak to the next, representing the interval from heel strike to heel strike. Once gait cycle boundaries are established, the temporal indices (i_start, i_end) are recorded and applied consistently to extract corresponding segments from the gyroscope and magnetometer magnitude signals. This index-based mapping ensures precise temporal alignment across all three sensor modalities, so that features from each sensor capture the same physical walking period. In total, 57 features are extracted to provide a well-rounded representation of gait dynamics. These features are evenly distributed across the three sensor modalities. The feature set includes statistical measures, temporal characteristics, and frequency-domain descriptors, each designed to capture different aspects of how a person walks. A detailed explanation of each feature category is provided in the following sub-sections
5.1 Per-sensor statistical and frequency features
An identical set of 12 features was extracted from each sensor's magnitude signal, yielding 36 features in total. These included statistical measures (mean, median, variance, skewness, kurtosis, 25th percentile), frequency-domain characteristics (dominant frequency via FFT, mid-frequency power ratio), signal complexity metrics (Hjorth activity, mobility, and complexity), and area under the curve (AUC) (Andersson et al., 2024; Al-Mahadeen et al., 2023). Features were named following the pattern sensor_number_name (e.g., acc_1_mean, gy_7_dominant_freq, and mag_10_hjorth_complexity), and so on.
5.2 Temporal and regularity features
Temporal dynamics were captured using jerk (first derivative) features, with the mean and standard deviation computed for each sensor (six features: acc_jerk_mean, acc_jerk_std, gy_jerk_mean, gy_jerk_std, mag_jerk_mean, and mag_jerk_std). Gait regularity was assessed using autocorrelation at lag-3 and lag-5 for all three sensors (six features), along with zero-crossing counts for the accelerometer and gyroscope (two features), for a total of 14 temporal and regularity features (Al-Mahadeen et al., 2023).
5.3 Multi-sensor fusion features
Cross-sensor relationships produced seven features: Pearson correlations between sensor pairs (acc_gy_correlation, acc_mag_correlation, and gy_mag_correlation), energy ratios between sensors (acc_gy_ratio, acc_mag_ratio, and gy_mag_ratio), and total motion energy integrating all three sensors (total_motion_energy), totaling seven multi sensors fusion features.
6 Conclusion
This data article presented a new gait-based recognition dataset within unconditional and large number of participants. Since, the total number of participants is 390 and their walking provide natural patterns as individuals simply carry their smartphone in their hand. In addition to that, during walking, the participants held the smartphone without any restrictive mounts or attachments. This authentic methodology supports the creation of human gait recognition that function during typical smartphone use and aids in health-focused applications. The application might be as identifying fall hazards and movement difficulties in aging populations.
In another side, the inclusion of data from multiple sensors allows researchers to determine the most useful metrics for specific tasks, guiding the design of optimized recognition solutions. Releasing this dataset publicly tackles a major obstacle in biometric studies including: the scarcity of standard, shareable data that has historically prevented consistent validation, and comparison between research efforts.
The dataset proves that large-scale, real-world behavioral data is feasible to gather, effectively connecting academic research with practical implementation. Expanding the dataset to include long-term tracking, diverse settings, and a wider range of participants will increase its utility for advancing both mobile security technologies and digital health tools.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://doi.org/10.6084/m9.figshare.30597743.v1.
Ethics statement
This study was reviewed and approved by the Ethics Committee of Koya University (Code No. KUEN25001). The study was conducted in accordance with the Declaration of Helsinki. All 390 participants provided written informed consent prior to data collection. Participants were informed about the study purpose, data collection procedures involving smartphone inertial sensors, and plans for public data sharing. Informed consent explicitly included permission to share anonymized sensor data in open-access repositories. The dataset has been fully anonymized. All inertial sensor data (accelerometer, gyroscope, and magnetometer) were pseudonymized using unique alphanumeric codes that are de-linked from any personally identifiable information. The dataset is shared in an anonymized form, ensuring that individual participants cannot be re-identified from the sensor data alone.
Author contributions
LA: Data curation, Project administration, Validation, Conceptualization, Investigation, Writing – review & editing, Methodology, Supervision, Funding acquisition, Resources, Writing – original draft, Formal analysis, Software, Visualization. AS: Conceptualization, Validation, Writing – review & editing, Project administration, Methodology. HM: Methodology, Writing – review & editing, Conceptualization, Project administration, Validation.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Acknowledgments
The authors would like to express their sincere gratitude to all participants who volunteered their time and effort to contribute to this research study. Special thanks are extended to the Deanship of Research and Development Centres – Koya University for providing the research facilities, institutional support, and controlled environment necessary for data collection. The authors also acknowledge the technical and administrative staff who assisted during the data collection process.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Al-Mahadeen, E., Alghamdi, M., Tarawneh, A. S., Alrowaily, M. A., Alrashidi, M., Alkhazi, I. S., et al. (2023). Smartphone user identification/authentication using accelerometer and gyroscope data. Sustainability 15:10456. doi: 10.3390/su151310456
Andersson, R., Bermejo-García, J., Agujetas, R., Cronhjort, M., and Chilo, J. (2024). Smartphone imu sensors for human identification through hip joint angle analysis. Sensors 24:4769. doi: 10.3390/s24154769
Cao, Q., Xu, F., and Li, H. (2022). User authentication by gait data from smartphone sensors using hybrid deep learning network. Mathematics 10:2283. doi: 10.3390/math10132283
Deb, S., Ou Yang, Y., Chua, M. C. H., and Tian, J. (2020). Gait identification using a new time-warped similarity metric based on smartphone inertial signals. J. Ambient Intell. Humaniz. Comput. 11, 4041–4053. doi: 10.1007/s12652-019-01659-7
Degbey, G.-S., Hwang, E., Park, J., and Lee, S. (2024). Deep learning-based obesity identification system for young adults using smartphone inertial measurements. Int. J. Env. Res. Public Health 21:1178. doi: 10.3390/ijerph21091178
Dehzangi, O., Taherisadr, M., and ChangalVala, R. (2017). Imu-based gait recognition using convolutional neural networks and multi-sensor fusion. Sensors 17:2735. doi: 10.3390/s17122735
Filippou, V., Backhouse, M. R., Redmond, A. C., and Wong, D. C. (2023). Person-specific template matching using a dynamic time warping step-count algorithm for multiple walking activities. Sensors 23:9061. doi: 10.3390/s23229061
Hoang, T., Choi, D., and Nguyen, T. (2015). “On the instability of sensor orientation in gait verification on mobile phone,” in 2015 12th International Joint Conference on E-Business and Telecommunications (ICETE), Vol. 4 (New York, NY: IEEE), 148–159. doi: 10.5220/0005572001480159
Kim, H.-E., Park, D.-H., An, C.-H., Choi, M.-Y., Kim, D., and Hong, Y.-S. (2025). Real-time detection of distracted walking using smartphone imu sensors with personalized and emotion-aware modeling. Sensors 25:5047. doi: 10.3390/s25165047
Li, S., Zhang, Y., Huang, X., and Lu, G. (2025). Online recognition of human gait based on smartphone sensors. Sens. Mater. 37, 2937–2408. doi: 10.18494/SAM5601
Lin, J., Keogh, E., Lonardi, S., and Chiu, B. (2003). “A symbolic representation of time series, with implications for streaming algorithms,” in Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (San Diego, CA; New York, NY: Association for Computing Machinery), 2–11. doi: 10.1145/882082.882086
Mondal, S., Nandy, A., Chakraborty, P., and Nandi, G. (2012). Gait based personal identification system using rotation sensor. J. Emerg. Trends Comput. Inf. Sci. 3, 395–402.
Rafiq, M., Almujally, N. A., Algarni, A., Jalal, A., and Liu, H. (2025). Intelligent biosensors for human movement rehabilitation and intention recognition. Front. Bioeng. Biotechnol. 13:1558529. doi: 10.3389/fbioe.2025.1558529
Raziff, A. R. A., Sulaiman, M. N., Mustapha, N., and Perumal, T. (2017). Gait identification using smartphone handheld placement with linear interpolation factor, single magnitude and one-vs-one classifier mapping. Int. J. Intell. Eng. Syst. 10, 70–80. doi: 10.22266/ijies2017.0831.08
Semwal, V. B., Gaud, N., Lalwani, P., Bijalwan, V., and Alok, A. K. (2022). Pattern identification of different human joints for different human walking styles using inertial measurement unit (IMU) sensor. Artif. Intell. Rev. 55, 1149–1169. doi: 10.1007/s10462-021-09979-x
Shahar, R. T., and Agmon, M. (2021). Gait analysis using accelerometry data from a single smartphone: agreement and consistency between a smartphone application and gold-standard gait analysis system. Sensors 21:7497. doi: 10.3390/s21227497
Keywords: biometry, gait dataset, gait patterns, gait recognition, human gait analysis, smartphone inertial sensors
Citation: Abdulrahman LS, Sabir AT and Maghdid HS (2026) A biometric dataset for unconditioned gait identification using onboard smartphone sensors. Front. Comput. Sci. 8:1752141. doi: 10.3389/fcomp.2026.1752141
Received: 22 November 2025; Revised: 26 December 2025;
Accepted: 09 January 2026; Published: 27 January 2026.
Edited by:
Hans Hallez, KU Leuven, BelgiumReviewed by:
Chunzhuo Wang, KU Leuven, BelgiumCopyright © 2026 Abdulrahman, Sabir and Maghdid. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Ladeh Sardar Abdulrahman, bGFkZWguc2FyZGFyQGtveWF1bml2ZXJzaXR5Lm9yZw==