ScaDS Logo

COMPETENCE CENTER
FOR SCALABLE DATA SERVICES
AND SOLUTIONS

TEC-3: Hadoop usage

 

Service Owner

ZIH

Contact

Jan Frenzel

Target

Developers, high-level programmers (Java) and Data Scientist which need distributed computing, but do not want to implement parallelization aspects, such as communication or data distribution.

Dependencies

TEC-1

Description

It is possible to evaluate and use our Hadoop cluster for your data analysis. Not only high-performance computers can be used to distribute data storage and processing and thus, reduce runtime. It is also possible to use commodity hardware via a so called shared-nothing cluster. The architecture and concepts are different than the usual ones used in HPC. Additionally, other programming paradigms, such as MapReduce, can be used. For testing this architecture and analyzing data, a Hadoop cluster can be used. It is possible to use it as a stand-alone cluster or together with other users. If you are planning to employ other software based on Hadoop, you can ask us for support.

Offerings

  •  login to our Hadoop cluster

  •  a dedicated Hadoop cluster environment with a configuration specialized to the user's needs

  •  hosting of Hadoop applications

  •  quickstart guides for writing Hadoop jobs

Consumption

  •  Collect information about your use case. Prepare for the following questions:

    •  Do you only want to evaluate Apache Hadoop?

    •  Do you already have a (serial/parallel) program?

    •  What challenge are you addressing with your program? This can include:

      •  Analyze data streams (in parallel)

      •  Distributed batch processing

    •  How long does it take to complete your (serial/parallel) program?

    •  How many resources do you need? (type of computing resources and amount of required computing time)

    •  Who is responsible for your project?

  •  Contact us (via e-mail or phone)

  •  We send you an application form. With this form, we want to have a look at your use case and see the specific requirements. This helps us to provide any additional software you might need. Additionally, we need this form to request computing resources.

  •  Fill out the form and send it back to us.

  •  We contact you, when your login is granted and you can access our cluster. This might require some time.

  •  We send you information about how to use our cluster. This includes material on how to log on or submit jobs to the cluster, write programs or avoid potential bottlenecks.