ScaDS Logo



Machine Learning Community (MLC) Dresden - second workshop

We are glad to announce the second workshop of the Machine Learning Community (MLC) Dresden which will take place on 16th of May 2019 at Helmholtz-Zentrum Dresden - Rossendorf (HZDR). In this workshop researchers in the field of machine learning are invited to come together, exchange their ideas, discuss problems and plan future cooperations in an easy-going atmosphere.

We welcome abstracts for talks atThis email address is being protected from spambots. You need JavaScript enabled to view it. by 25th of April. Please choose the length of your presentation: 10, 25 or 45 minutes. In the short talks also questions to the community, open points for discussion and work-in-progress can be presented. The length of the Abstract should be between three lines and half a page. To participate, please send a short registration email with your name and home institution to This email address is being protected from spambots. You need JavaScript enabled to view it. by 25th of April.

The workshop will take place in the large auditorium of HZDR from 10 a.m. to 5 p.m. There is a direct bus connection from Dresden to the HZDR (Bus 261 starting at 9:15 from Dresden Hauptbahnhof and at 9:25 from Dresden Albertplatz.)

A short look back to the activities of MLC Dresden during the last year:

  • 15th of May 2018: first workshop (kick-off) at TU Dresden
  • 26th of June 2018: first thematic meeting on "ANNs: from Black Box to Open Book"
  • 8th of November 2018: second thematic meeting on "Medical and Biological Image Segmentation"
  • 20th December 2018: third thematic meeting on "Git for Data Scientists"

More Information on MLC can be found at

We are looking forward to seeing you there and to have fruitful discussions,

The MLC organising team:

Heide Meissner (HZDR)
Jeffrey Kelling (HZDR)
Peter Winkler (TU Dresden, ScaDS)
Steffen Seitz (TU Dresden)

Fusion of HPC and Data Analytics (HPC-DA)

The ZIH expands its high-performance computer with system components for the analysis of complex large amounts of data. The extension offers researchers more than 2 petabytes of flash memory with a bandwidth of about 2 terabytes/s. The flash memory is flexibly configurable and can be used at all existing ZIH computing nodes. For large data volumes, an object memory of 10 petabytes is also provided. Both solutions are supplied by NEC. As an interface between HPC and Data Analytics, "HPC-DA" offers scalable virtual research environments tailored to user requirements. The computing capacity will be extended by 22 IBM Power-9 nodes, each with six Nvidia V100 GPUs, which will be connected to the storage systems mentioned and thus provide one of the currently most powerful machine learning infrastructures in Germany. All in all, the system offers the opportunity to flexibly combine different technologies to create efficient and customizable research infrastructures. The installation will be open to users from all over Germany whose HPC and Big Data application cases can benefit in a special way from HPC-DA.
Contact: Project proposals can be sent via the HPC gateway of the ZIH:  (Contact: Dr. Ulf Markwardt, Tel.: 0049-351-463-33640)


Next week (Tuesday Jan. 15th, 15:30 ) we will have Dr. Martin Beck as a guest speaker in our ScaDS-Colloquium. You can find the Title/Abstract below.
You are invited to join the presentation
Time&Date: Tuesday Jan. 15th, 2019, 3:30 pm
Location: ScaDS Meetingroom, Ritterstrasse 9-13, 04109 Leipzig
Speaker: Dr. Martin Beck, TU Dresden
Title: PrivApprox: Privacy-Preserving Stream Analytics
How to preserve users’ privacy while supporting high-utility analytics for low-latency stream processing?
To answer this question: we describe the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three important properties: (i) Privacy: zero-knowledge privacy guarantee for users, a privacy bound tighter than the state-of-the-art differential privacy; (ii) Utility: an interface for data analysts to systematically explore the trade-offs between the output accuracy (with error estimation) and the query execution budget; (iii) Latency: near real-time stream processing based on a scalable “synchronization-free” distributed architecture.
The key idea behind our approach is to marry two techniques together, namely, sampling (used for approximate computation) and randomized response (used for privacy-preserving analytics). The resulting marriage is complementary—it achieves stronger privacy guarantees, and also improves the performance for stream analytics.

Fusion von HPC und Data Analytics (HPC-DA)

Das ZIH erweitert seinen Hochleistungsrechner um Systemkomponenten für die Analyse komplexer großer Datenmengen. Die Erweiterung bietet den Forschenden mehr als 2 Petabyte Flash-Speicher mit einer Bandbreite von etwa 2 Terabyte/s, wobei der Flash-Speicher flexibel konfigurierbar ist und an allen vorhandenen ZIH-Rechenknoten genutzt werden kann. Für große Datenvolumen wird daneben ein Objekt-Speicher von 10 Petabyte bereitgestellt. Beide Lösungen werden von der Firma NEC geliefert. Als Schnittstelle von HPC und Data Analytics bietet „HPC-DA“ skalierbare virtuelle Forschungsumgebungen, die auf die Anforderungen der Anwender/-innen zugeschnitten sind. Die Rechenkapazität wird um 22 Power-9-Knoten der Firma IBM mit jeweils sechs Nvidia-V100-GPUs erweitert, die an die genannten Speichersysteme angebunden werden und damit eine der momentan leistungsfähigsten Machine-Learning-Infrastrukturen in Deutschland zur Verfügung stellen. Insgesamt bietet das System damit die Möglichkeit, verschiedene Technologien flexibel zu effizienten und individualisierbaren Forschungsinfrastrukturen zu kombinieren. Die Installation wird Nutzer/-innen aus ganz Deutschland offenstehen, deren HPC- und Big-Data-Anwendungsfälle in besonderer Weise von HPC-DA profitieren können. Der Produktionsbetrieb beginnt ab Ende 2018; Projektanträge können über das Antragsportal des ZIH eingereicht werden. (Ansprechpartner: Dr. Ulf Markwardt, Tel.: 0049-351-463-33640)

The research paper Using Link Features for Entity Clustering in Knowledge Graphs has received the Best Research Paper Award of the 15th Extended Semantic Web Conference (ESWC) held in June 2018 in Heraklion, Greece. The paper describes the CLIP algorithm for entity clustering that substantially outperforms previous approaches and that can also be applied for repairing entity clusters. CLIP has been added to the FAMER tool, a system for parallel multi-source entity resolution based on Apache Flink. The awarded paper is authored by Alieh Saeedi, Eric Peukert and Erhard Rahm from the database group Leipzig and the Big Data Center ScaDS; Alieh presented the paper at the conference. The ESWC 2018 research track had 31 papers selected from 132 submissions so that the Best Research paper award represents a significant distinction.