Bachelor's or Master's Thesis (Leipzig):
Evaluation on the reproducibility of performance measurements on clusters in the cloud
Probably a growing number of scientists are using for their research cloud computing offers like Amazon Elastic Compute Cloud or Google Compute Engine. These on demand computing resources are especially inviting when using big data software like Apache Hadoop or Apache Flink. But for performance measurements a possible problem arises there: How reproducible are those tests in the cloud? First evaluations on reproducibility of cloud offers already done, but recent measurements especially on clusters needed for big data tasks are missing.
The work includes the following subtasks:
- collection and comparison of performance guarantees of the common cloud offerings
- literature review: existing work on reproducibility in the cloud
- literature review on performance measurements in the cloud in big data research
- theoretical and practical evaluation of reproducibility for common big data benchmarks. This could include dedicated clusters in addition to selected cloud offerings.
- Hashem, Ibrahim Abaker Targio; Yaqoob, Ibrar; Anuar, Nor Badrul; Mokhtar, Salimah; Gani, Abdullah; Ullah Khan, Samee (2015): The rise of “big data” on cloud computing: Review and open research issues. In: Information Systems 47, S. 98–115. DOI: 10.1016/j.is.2014.07.006.
- Collins, E. (2014): Big Data in the Public Cloud. In: IEEE Cloud Computing 1 (2), S. 13–15. DOI: 10.1109/MCC.2014.29.
- Jackson, K. R.; Ramakrishnan, L.; Muriki, K.; Canon, S.; Cholia, S.; Shalf, J. et al. (Hg.) (2010): Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud. Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on.
- Schad, Jörg; Dittrich, Jens; Quiané-Ruiz, Jorge-Arnulfo (2010): Runtime measurements in the cloud: observing, analyzing, and reducing variance. In: Proc. VLDB Endow. 3 (1-2), S. 460–471. DOI: 10.14778/1920841.1920902.x