ScaDS Logo

COMPETENCE CENTER
FOR SCALABLE DATA SERVICES
AND SOLUTIONS

Bachelor's or Master's Thesis(Leipzig): 

Distributed Percentile Calculation for Floating Point Numbers in NoSQL Databases

Motivation

"Data is the oil of the 21st century" . Especially for data-driven companies or companies undergoing digital revolution. Fast access to exact key figures on vast amounts of data are very important. However, the calculation of the median or freely chosen percentiles for floating point numbers are challenging in distributed NoSQL databases.

Aim

This theses aims to define and implement an exact algorithm for percentiles on floating point numbers by either using Apache Accumulo with server-side iterators or Apache Flink as distributed calculation framework.

Contact

  • Dr. Eric Peukert - Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein! 

  • Matthias Kricke, M. Sc. – Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!

References

[1] Knuth, Donald Ervin: The Art of computer programming. Volume 2, Seminumerical algorithms. S. 216, 1998.

[2] Saukas, Einar LG; Song, Siang W: Efficient selection algorithms on distributed memory computers. In: Proceedings of the 1998 ACM/IEEE conference on Supercomputing. IEEE Computer Society, S. 1–26, 1998.