ScaDS Logo

COMPETENCE CENTER
FOR SCALABLE DATA SERVICES
AND SOLUTIONS

SHK-Position, Bachelor/Master Thesis (Leipzig): Large-scale imputation of genotypes

In a cooperation with the IMISE at the University of Leipzig we aim to improve performance of so called imputation workflows. Imputation in genetics refers to the statistical inference of unobserved genotypes. It is achieved by using known reference data for instance from the 1000 Genomes Project in humans. The imputed additional genotypes can be used in further studies that relate genetic variants to certain traits. 

Unfortunately Imputation is a very expensive algorithm that might take weeks or month to compute a result depending on the given number of processors.

At ScaDS we began to parallelize the impuation job in an embarassinlgy parallel mannor and are able to achieve significant time savings by execution on a HPC-Infrastructure in more than 500 parallel jobs or on a shared nothing infrastructre with up to 90 nodes. In that context we rely on SLURM but also plan to look for alternatives such as UNICORE.

There are a number of interesting problems to solve, and we are looking for working students or bachelor/master students to support our work.

Students should bring 

  • Motivation to dive into imputation and parallization topics
  • Some Linux-skills would be helpful
  • We are also loking for Students with web-development skills to build services and front-ends on top of the parallel imputation solution

 

 

Contact: 

  • Dr. Eric Peukert (Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!)
  • Holger Kirsten (Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!)