ScaDS Logo

COMPETENCE CENTER
FOR SCALABLE DATA SERVICES
AND SOLUTIONS

Golden Genome Project

 

Background

In the last years many new genomes are sequenced. So in the field of genomics and transcriptomics the amount of data is getting more and more. This gives great possibilities for the analysis of one species. However, it also gives possibilities to compare two or more species which each other. This comparison is based on the sequences but this is only the starting point. In biology not all mutations can be seen by a local comparison of two sequences. So the solution is to get more information out of the context. Where the context is the surrounding sequences and also other species with sequences that have the same origin. By using this information, a common coordinate system can be created. The comparisons between species are easily possible with this system.

 

Objectives

In this project we aim to establish a tool to create a common coordinate system out of a given genome alignment. This tool is using graphs as backbone data structure. These graphs have two advantages on one hand graph rewriting can be used to reduce the complexity of the problem and on the other hand they give possibilities to get different views on this common coordinate system. The graphs grow fast in size so that scalable Big Data technologies are used to ensure that the system can work on mostly every input size.

 

Project Members

 

ScaDS Dresden/Leipzig

Fabian Externbrink

Falco Kirchner (Student)

Bioinformatics

Dr. Lydia Müller

Dr. Christian Höner zu Siederdissen

Prof. Peter F. Stadler

Collaboration partners

Dr. Michael Hiller (MPI-CBG & MPI-PKS)

 

Preliminary Results

A Prototype is in construction

  • Parsing and filtering are implemented
  • A good fitting graph data structure is used
  • Efficient rewrite rules are identified on the graphs
  • Results can be created 

The theory is on the way of publication