ScaDS Logo

COMPETENCE CENTER
FOR SCALABLE DATA SERVICES
AND SOLUTIONS

TEC-2: Apache Flink cluster usage/processing capabilities 

 

Contact

Jan Frenzel

Target

Developers, high-level programmers (Java or Scala) or Data Scientists which require distributed computing, but do not want to implement parallelization aspects, such as communication or data distribution.

Dependencies

TEC-1

Description

The ScaDS team offers its preconfigured Apache Flink cluster to be used. Apache Flink offers the possibility to write a parallel program via predefined operators together with user-defined functions. The user does not need to cope with parallelization, data partitioning or distribution of data, because these aspects are done automatically by the framework. Instead, the user has to provide functions which are wrapped by Apache Flink's operators. By using this mechanism, it is not required to call asynchronous or communication functions, because this is done by the framework itself. When the execution plan is created, the program can be easily send to a cluster for execution.

Offerings

  • Quickstart guides and example programs for the first steps with Apache Flink are available. Addition- ally, we collect the challenges that users are facing and possible solutions.

  • For beginners, we provide quickstart guides for starting Apache Flink jobs. We configure clusters to the users' needs. Additionally, we plan to provide a more comfortable way to analyze and monitor your jobs, e.g. via a new GUI.

  • Applications, which need to be available for a larger group or have special uptime or storage require- ments can be run on ZIH resources, so that the user does not need to provide the resources (storage or computing power) himself

Consumption

  • Collect information about your use case. Prepare for the following questions:

  • Do you only want to evaluate Apache Flink?

  • Do you already have a (serial/parallel) program?

  • What challenge are you focusing with your program? This can include:

  • Analyze data streams (in parallel)

  • Analyze graphs

  • How long does it take to complete your program (serial/parallel)?

  • How many resources do you need? (type of computing resources and amount of required com- puting time)

  • Who is responsible for your project?

  • Contact us (via e-mail or phone)

  • We send you an application form. With this form, we want to have a look at your use case and see the

  • specific requirements. This helps us to provide any additional software you might need. Additionally,

  • we need this form to request computing resources.

  • Fill out the form and send it back to us.

  • We contact you, when your login is granted and you can access our cluster. This might require some time.

  • We send you information about how to use our cluster. This includes material on how to log on or submit jobs to the cluster, write programs or avoid potential bottlenecks.