Golden Genome Project
Background
In the last years many new genomes are sequenced. So in the field of genomics and transcriptomics the amount of data is getting more and more. This gives great possibilities for the analysis of one species. However, it also gives possibilities to compare two or more species which each other. This comparison is based on the sequences but this is only the starting point. In biology not all mutations can be seen by a local comparison of two sequences. So the solution is to get more information out of the context. Where the context is the surrounding sequences and also other species with sequences that have the same origin. By using this information, a common coordinate system can be created. The comparisons between species are easily possible with this system.
Objectives
In this project we aim to establish a tool to create a common coordinate system out of a given genome alignment. This tool is using graphs as backbone data structure. These graphs have two advantages on one hand graph rewriting can be used to reduce the complexity of the problem and on the other hand they give possibilities to get different views on this common coordinate system. The graphs grow fast in size so that scalable Big Data technologies are used to ensure that the system can work on mostly every input size.
Project Members
ScaDS Dresden/Leipzig
Fabian Externbrink
Falco Kirchner (Student)
Bioinformatics
Dr. Lydia Müller
Dr. Christian Höner zu Siederdissen
Prof. Peter F. Stadler
Collaboration partners
Dr. Michael Hiller (MPI-CBG & MPI-PKS)
Preliminary Results
A Prototype is in construction
- Parsing and filtering are implemented
- A good fitting graph data structure is used
- Efficient rewrite rules are identified on the graphs
- Results can be created
The theory is on the way of publication