## A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window
Busch, Costas and Tirthapura, Srikanta
(2006)
Full text available as:
## AbstractWe consider the problem of maintaining aggregates over recent elements of a massive data stream. Motivated by applications involving network data, we consider {\em asynchronous} data streams, where the observed order of data may be different from the order in which the data was generated. The set of recent elements is modeled as a {\em sliding timestamp window} of the stream, whose elements are changing continuously with time. We present the first {\em deterministic} algorithms for maintaining a small space summary of elements in a sliding timestamp window of an asynchronous data stream. The summary can return approximate answers for the following fundamental aggregates: {\em basic count}, the number of elements within the sliding window, and {\em sum}, the sum of all element values within the sliding window. For basic counting, the space taken by our summary is $O(\log W \cdot \log B \cdot (\log W + \log B)/\epsilon)$ bits, where $B$ is an upper bound on the value of the basic count, $W$ is an upper bound on the width of the timestamp window, and $\epsilon$ is the desired relative error. Our algorithms are based on a novel data structure called {\em splittable histogram}. Prior to this work, randomized algorithms were known for this problem, which provide weaker guarantees than those provided by our deterministic algorithms.
Archive Staff Only: edit this record |