ScaDS Logo

CENTER FOR
SCALABLE DATA ANALYTICS
AND ARTIFICIAL INTELLIGENCE

Master thesis (Dresden)

The Big Data Framework Thrill: An Investigation of Functionality and Performance

The goal of the thesis is to investigate the Big Data processing framework Thrill for its functionality and performance. This framework is written in C++ and offers an MPI communication backend. We first discuss and understand the framework's features. We then introduce how to configure and execute a Thrill application on an HPC cluster. The benchmark word count and a use-case with data from an HPC cooling system is implemented using Thrill. We proceed to evaluate Thrill as an HPC application and also run performance comparisons with Apache Spark. Thrill showed good HPC scalability characteristics and outperformed Apache Spark. A faster execution time compared to Apache Spark and having a native C++ framework implementation opens new possibilities to exploit faster data processing using Thrill on an HPC cluster.

 

Kontakt: Dr. Tara Lazariv, Dr. Christoph Lehmann