SHK-Position, Bachelor/Master Thesis (Leipzig): Large-scale imputation of genotypes
In a cooperation with the IMISE at the University of Leipzig we aim to improve performance of so called imputation workflows. Imputation in genetics refers to the statistical inference of unobserved genotypes. It is achieved by using known reference data for instance from the 1000 Genomes Project in humans. The imputed additional genotypes can be used in further studies that relate genetic variants to certain traits.
Unfortunately Imputation is a very expensive algorithm that might take weeks or month to compute a result depending on the given number of processors.
At ScaDS we began to parallelize the impuation job in an embarassinlgy parallel mannor and are able to achieve significant time savings by execution on a HPC-Infrastructure in more than 500 parallel jobs or on a shared nothing infrastructre with up to 90 nodes. In that context we rely on SLURM but also plan to look for alternatives such as UNICORE.
There are a number of interesting problems to solve, and we are looking for working students or bachelor/master students to support our work.
Students should bring
- Motivation to dive into imputation and parallization topics
- Some Linux-skills would be helpful
- We are also loking for Students with web-development skills to build services and front-ends on top of the parallel imputation solution