Genome Alignment Processing with MongoDB, Flink and Gelly
The bioinformatics summer school workshop will work with genome alignments. The Data is broken down to positions to create a coordinate system between the species. To participate in this workshop some background knowledge in bioinformatics is recommended but not necessary. To solve the problems no biological knowledge is necessary.
On the first day MongoDB-Basiscs and its usage in a Java-Based data processing engine are introduced. The focus here lies on the connection between Apache Flink/Java with MongoDB. Participants learn how to read data from Java, convert it in a MongoDB format, and then writing it to MonogDB by using the Hadoop connector. Afterwards some example queries are executed to explain how the query engine of MongoDB works.
The second day aims to give examples on how to work with Apache Flink. The aim is to find overlapping sequences and to delete these sequences. With the help of real world biological examples we show how to use the Flink engine and the most important flink operators like group or join. On the third day we focus on Flink Gelly. Gelly is an api for graph analysis in Flink. We show how to transform the alignment data to a graph. The graph gives another view on the data an allows us to perform some more complex statistics. Along the alignment example participants will learn how to create and handle graphs in Gelly.