ScaDS Logo

COMPETENCE CENTER
FOR SCALABLE DATA SERVICES
AND SOLUTIONS

Master's Thesis(Leipzig): Server-Side Aggregation of Time-Series Data in Distributed NoSQL Databases

Motivation

Time-Series data has become more and more important for Industry 4.0, IoT and data-driven companies. Since the data volume is rising, NoSQL databases like Apache Accumulo, Cassandra and HBase are providing extensions to work with time-series data:

Unfortunately they are either immature or didn't provide exact numbers for aggregations (min, max, sum, avg, std deviation, percentile) of large data sets. 

Aim

This theses aims to define a performant schema for exact aggregations by either using Apache Accumulo with server-side iterators or Apache Flink as distributed calculation framework.

Contact

  • Dr. Eric Peukert - Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein! 

  • Matthias Kricke, M. Sc. – Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein! 

     

Student

  • Oliver Swoboda

Publication

  • Swoboda, O.: Serverseitige Aggregation von Zeitreihendaten in verteilten NoSQL-Datenbanken. In BTW (Workshops) (pp. 365-373), GI 2017 [PDF]

References

[1] Knuth, Donald Ervin: The Art of computer programming. Volume 2, Seminumerical algorithms. S. 216, 1998.

[2] Menne; M.J.; Durre, I.; Korzeniewski, B.; McNeal, S.; Thomas, K.; Yin, X.; Anthony, S.; Ray, R.; Vose, R.S.; Gleason, B.E.; Houston, T.G.: Global historical climatology network-daily (GHCN-Daily), Version 3.22. NOAA National Climatic Data Center, 2012. http://doi.org/10.7289/V5D21VHZ, Stand:18.10.2016.

[3] Saukas, Einar LG; Song, Siang W: Efficient selection algorithms on distributed memory computers. In: Proceedings of the 1998 ACM/IEEE conference on Supercomputing. IEEE Computer Society, S. 1–26, 1998.