ScaDS Logo

CENTER FOR
SCALABLE DATA ANALYTICS
AND ARTIFICIAL INTELLIGENCE

Big Data Cluster in “Shared Nothing” Architecture in Leipzig - Galaxy Infrastructure

Beitragsseiten

Galaxy Infrastructure

Management via Foreman and Puppet

Managing a cluster of this size is quite complex and separate installation no longer an option. It would be too much manual work with too many possible errors or inconsistent configurations. For the Galaxy cluster we utilize the system life cycle management tool Foreman in combination with Puppet as configuration management. Foreman supports us with the automated system installation configured via template files and the management infrastructure for Puppet. Puppet clients on the worker nodes check in with the Foreman server for updated Puppet modules, where we can describe more specific node configuration.

Disk Partitioning

One thing we define in Foreman is the hard disk partitions and mount options for the worker nodes. We use XFS and ZFS as local filesystem, both are quite mature and support very large files and storage volumes. Currently we defined the following partitions for optimal flexibility:

 

Mount point hard disks file­system size comment
 root and several other directories HD 1 XFS ~0,5 TB  
/home HD 1 XFS 3,5 TB  
/scratch_zfsvol HD 2-6 ZFS ~7 TB sw raid 6 via ZFS, used as big scratch directory
/scratch/hdfs[1-5] HD 2-6 XFS 5x ~2 TB optimized for Apache Hadoops HDFS storage

Login Management

We use a dedicated login database to enable access to the big data cluster in Leipzig for scientists from several research locations in Saxony and the near vicinity. This database for scientific computing logins is separated from Leipzig University’s login database. Researcher can register for a scientific computing login via a management portal, using the DFN AAI authentication and authorization service. At the same management portal one can reset ones password.

Technically this scientific computing logins are stored in an Active Directory and each worker node is connected to it using the SSSD package.

Gateway

The gateway node is a special node as public entry point to the cluster. Typical tasks handled at the gateway node at a secure shell (SSH) are

  • data transfer to or from the cluster
  • job commit
  • access to job subsystems for progress or error information
  • direct login to a used worker node
For the galaxy cluster quite a lot of management and master components are needed as well as some basic software on each worker node. The Cluster Manager, Register Service and Gateway is interacting with the outside. The Cluster Manager is introduced later in this article.
For the galaxy cluster quite a lot of management and master components are needed as well as some basic software on each worker node. The Cluster Manager, Register Service and Gateway is interacting with the outside. The Cluster Manager is introduced later in this article.