Architecture, Data Model and Real-Time Performance Evaluation of the Streamonas Data Stream Management System
- Author(s): Michael, Panayiotis Adamos;
- Advisor(s): Parker, Douglass S;
- Manousiouthakis, Vasilios I
- et al.
Within the challenging environment of data streams characterized by large volumes of data, having high data-flow rates, most Data Stream Management Systems (DSMSs) process information directly from the incoming serial data stream.
A radically different approach is followed by Streamonas DSMS presented in this Dissertation. The DSMS processes its information after the serial stream is restructured into its own, novel, object-oriented, parallel data model called the Spatio-Temporal Cuboid (ST-Cuboid).
By doing so, Streamonas has achieved both record-level performance and efficiency. As published, through extensive experiments over the Linear Road Benchmark (LRB), the single-cpu/single-core Streamonas is the first and still the only DSMS to have reached on a single cpu, the maximum level of difficulty of the benchmark, i.e. 10 XWays.
Distributed Streamonas, also presented in this Dissertation, has recently achieved on the LRB a level of 550 XWays which is also a record. The work has been submitted for publication.
While query latencies on the LRB reported by excellent DSMSs are on the order of a second, Streamonas demonstrated impressive query latencies on the order of a microsecond (single-cpu/single-core Streamonas) and millisecond (Distributed Streamonas).
Enforcing database consistency in real-time is so difficult that researchers often view the concept as a trade-off of accuracy for efficiency. Within this context, researchers have developed pioneering frameworks including semantic load shedding, data reduction and application-specific summary techniques.
Based on its framework named "Real-Time Consistency" the novel Streamonas architecture has been able to achieve 100% accurate results with excellent performance characteristics, without trade-offs, so that consistent database states are materialized within time intervals of less than 400 microseconds. These results have been achieved for large distributed enterprise-like applications where the in-memory distributed database reaches 65 GBytes and throughput exceeds 730,000 tuples/sec.
Our Thesis is that the novel architecture of Streamonas, following three fundamental principles, supported by the novel, object-oriented, parallel ST-Cuboid data-structure and the rest of novel sub-systems and frameworks presented in this dissertation, has created a radical change to the perspective the research community views data-stream management, permitting the development of new kinds of improved Data Stream Management Systems in the years to come.