| Available via | http://dbpubs.stanford.edu/pub/2003-68 |
| Previous version | 2003-19 |
|
Submitted on |
22nd of October 2003 |
|
Author |
Babcock, Brian; Babu, Shivnath; Datar, Mayur; Motwani, Rajeev; Thomas, Dilys |
|
Title |
Operator Scheduling in Data Stream Systems |
|
Date of publication |
2003 |
|
Citation |
Babcock, Brian; Babu, Shivnath; Datar, Mayur; Motwani, Rajeev; Thomas, Dilys. Operator Scheduling in Data Stream Systems, |
|
Number of pages |
31 |
|
Language |
English |
|
Project |
STREAM |
|
Type |
Technical Report |
|
Subject group |
Data Streams; Miscellaneous |
|
Abstract |
In many applications involving continuous data streams, data arrival
is bursty and data rate fluctuates over time. Systems that seek to
give rapid or real-time query responses in such an environment must be
prepared to deal gracefully with bursts in data arrival without
compromising system performance. We discuss one strategy for
processing bursty streams --- adaptive, load-aware scheduling of
query operators to minimize resource consumption during times of peak
load. We show that the choice of an operator scheduling strategy can
have significant impact on the run-time system memory usage as well as
output latency. Our aim is to design a scheduling strategy that
minimizes the maximum run-time system memory, while maintaining the
output latency within prespecified bounds. We first present
Chain scheduling, an operator scheduling strategy for data stream
systems that is near-optimal in minimizing run-time memory usage for
any collection of single-stream queries involving selections,
projections, and foreign-key joins with stored relations. Chain
scheduling also performs well for queries with sliding-window joins
over multiple streams, and multiple queries of the above
types. However, during bursts in input streams, when there is a
buildup of unprocessed tuples, Chain scheduling may lead to high
output latency. We study the online problem of minimizing maximum
run-time memory, subject to a constraint on maximum latency. We
present preliminary observations, negative results, and heuristics for
this problem. A thorough experimental
evaluation is provided where we demonstrate the potential benefits of
Chain scheduling and its different variants,
compare it with competing scheduling strategies,
and validate our analytical conclusions. |
|
Contact address |
shivnath@stanford.edu |
|
Notes |
This paper is an extended version of our paper
titled "Chain: Operator Scheduling for Memory Minimization in Data Stream
Systems" that appeared
in the proceedings of SIGMOD 2003. The basic Chain algorithm
and its theoretical and experimental analysis were reported
in the SIGMOD paper. The NP-completeness result showing the intractability
of the problem of minimizing memory in Section 4,
and the theoretical results and experiments
for handling latency constraints in Sections 5
and 6.2 respectively are being presented for the first
time in this paper. |
| Fulltext source |
Postscript (ps, ps.gz, ps.zip)
PDF (pdf, pdf.gz, pdf.zip)
| Management of the document by | rwesley@stanford.edu
| |