Master's Thesis(Leipzig): Server-Side Aggregation of Time-Series Data in Distributed NoSQL Databases
Motivation
Time-Series data has become more and more important for Industry 4.0, IoT and data-driven companies. Since the data volume is rising, NoSQL databases like Apache Accumulo, Cassandra and HBase are providing extensions to work with time-series data:
Unfortunately they are either immature or didn't provide exact numbers for aggregations (min, max, sum, avg, std deviation, percentile) of large data sets.
Aim
This theses aims to define a performant schema for exact aggregations by either using Apache Accumulo with server-side iterators or Apache Flink as distributed calculation framework.
Contact
-
Dr. Eric Peukert - Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!
-
Matthias Kricke, M. Sc. – Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!
Student
- Oliver Swoboda
Publication
- Swoboda, O.: Serverseitige Aggregation von Zeitreihendaten in verteilten NoSQL-Datenbanken. In BTW (Workshops) (pp. 365-373), GI 2017 [PDF]
References
[1] Knuth, Donald Ervin: The Art of computer programming. Volume 2, Seminumerical algorithms. S. 216, 1998.
[2] Menne; M.J.; Durre, I.; Korzeniewski, B.; McNeal, S.; Thomas, K.; Yin, X.; Anthony, S.; Ray, R.; Vose, R.S.; Gleason, B.E.; Houston, T.G.: Global historical climatology network-daily (GHCN-Daily), Version 3.22. NOAA National Climatic Data Center, 2012. http://doi.org/10.7289/V5D21VHZ, Stand:18.10.2016.
[3] Saukas, Einar LG; Song, Siang W: Efficient selection algorithms on distributed memory computers. In: Proceedings of the 1998 ACM/IEEE conference on Supercomputing. IEEE Computer Society, S. 1–26, 1998.