AUTHOR=Darji Jay , Biswas Nupur , Padul Vijay , Gill Jaya , Kesari Santosh , Ashili Shashaanka 

TITLE=Efficient use of binned data for imputing univariate time series data

JOURNAL=Frontiers in Big Data

VOLUME=Volume 7 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2024.1422650

DOI=10.3389/fdata.2024.1422650

ISSN=2624-909X

ABSTRACT=Time-series data is recorded in various sectors resulting in a large amount of data. The continuity of this data is often interrupted generating missing data periods. Several algorithms are used to impute the missing data and the performance of these methods is widely varied. Apart from the choice of algorithm, the effective imputation is very much dependent on the nature of missing and available data. 
We performed extended studies using different types of time-series data, namely heart rate data and power consumption data. We made the data missing for different time spans and imputed using different algorithms with binned data of different sizes. The performance was evaluated using the root mean square error (RMSE) metric. 
We observed a reduction in RMSE when binned data was used compared to the use of the entire data, specifically in the case of the expectation maximization algorithm. We report for 1 min, 5 min and 15 min missing data RMSE reduced when binned data was used and reduction is more for 15 min missing data. We also observed the effect of data fluctuation.   
We conclude that the usefulness of binned data depends precisely on the span of missing data, sampling frequency of the data and fluctuation within data. Depending on the inherent characteristics, quality and quantity of the missing and available data, binned data can impute wide varieties of data including biological heart-rate data derived from the IoT device smartwatch as well as non-biological data like household power consumption data.