Renewable quantile regression for streaming datasets

作者:

Highlights:

摘要

Streaming data analysis has drawn much attention, where large amounts of data arrive in streams. Because limited memory can only store a small batch of data, fast analysis without access to the historical data is necessary. Quantile regression has been widely used in many fields because of its robustness and comprehensiveness. However, in the streaming data environment, it is challenging to implement quantile regression by the conventional methods, because they are all based on the assumption that the memory can fit all the data. To fix this issue, this paper proposes a novel online renewable quantile regression strategy, in which the resulting estimator is renewed with current data and summary statistics of historical data. Thus, it is computationally efficient, and not storage-intensive. What is more, the theoretical results also confirm that the proposed estimator is asymptotically equivalent with the oracle estimator calculated using the entire data together. Numerical experiments on both synthetic and real data verify the theoretical results and illustrate the good performance of the new method.

论文关键词:Streaming data environment,Quantile regression,Online updating learning

论文评审过程:Received 6 July 2021, Revised 20 September 2021, Accepted 31 October 2021, Available online 2 November 2021, Version of Record 8 November 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107675