ScaDS Logo



Summer school on Big Data and Machine Learning successfully finished

From August 17th to 23rd 2019 the two Germany based Big Data Competence Centers ScaDS Dresden/Leipzig and BBDC held the fifth international summer school on Big Data and Machine Learning in Dresden. This time, the summer school bridged the gap between the research fields Big Data and machine learning, with contributions from many internationally well-known experts from various fields. The highly recognized program included key notes from IBM, NVIDIA, Intel, and speakers from academia of both competence centers BBDC and ScaDS Dresden/Leipzig as well as invited speakers. The topics span a wide range of topics around large scale and data intensive computing (Big Data) and exciting new trends in machine learning, such as uncertainty quantification, distributed machine learning and architectural optimization for deep learning. Almost sixty participants could not just take part and connect to the expert, but could also contribute a poster about own research activity in a poster session and during the whole week to trigger discussions between participants.  As social activity an archery tournament brought fun and a contrast into the program as well as triggered some competition among the participants. Stay in touch with us about future activities, e.g.the Big Data and AI in Business Workshop @September 19.-20. in Leipzig!

IBM keynote during summer school

Machine Learning Community (MLC) Dresden - second workshop

We are glad to announce the second workshop of the Machine Learning Community (MLC) Dresden which will take place on 16th of May 2019 at Helmholtz-Zentrum Dresden - Rossendorf (HZDR). It is co-organised by HZDR and the Competence Center for Scalable Data Services and Solutions (ScaDS). In this workshop researchers in the field of machine learning are invited to come together, exchange their ideas, discuss problems and plan future cooperations in an easy-going atmosphere.

We welcome abstracts for talks atDiese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein! by 25th of April. Please choose the length of your presentation: 10, 25 or 45 minutes. In the short talks also questions to the community, open points for discussion and work-in-progress can be presented. The length of the Abstract should be between three lines and half a page. To participate, please send a short registration email with your name and home institution to Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein! by 25th of April.

The workshop will take place in the large auditorium of HZDR from 10 a.m. to 5 p.m. There is a direct bus connection from Dresden to the HZDR (Bus 261 starting at 9:15 from Dresden Hauptbahnhof and at 9:25 from Dresden Albertplatz.)

A short look back to the activities of MLC Dresden during the last year:

  • 15th of May 2018: first workshop (kick-off) at TU Dresden
  • 26th of June 2018: first thematic meeting on "ANNs: from Black Box to Open Book"
  • 8th of November 2018: second thematic meeting on "Medical and Biological Image Segmentation"
  • 20th December 2018: third thematic meeting on "Git for Data Scientists"

More Information on MLC can be found at

We are looking forward to seeing you there and to have fruitful discussions,

The MLC organising team:

Heide Meissner (HZDR)
Jeffrey Kelling (HZDR)
Peter Winkler (TU Dresden, ScaDS)
Steffen Seitz (TU Dresden)

Next week (Tuesday Jan. 15th, 15:30 ) we will have Dr. Martin Beck as a guest speaker in our ScaDS-Colloquium. You can find the Title/Abstract below.
You are invited to join the presentation
Time&Date: Tuesday Jan. 15th, 2019, 3:30 pm
Location: ScaDS Meetingroom, Ritterstrasse 9-13, 04109 Leipzig
Speaker: Dr. Martin Beck, TU Dresden
Title: PrivApprox: Privacy-Preserving Stream Analytics
How to preserve users’ privacy while supporting high-utility analytics for low-latency stream processing?
To answer this question: we describe the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three important properties: (i) Privacy: zero-knowledge privacy guarantee for users, a privacy bound tighter than the state-of-the-art differential privacy; (ii) Utility: an interface for data analysts to systematically explore the trade-offs between the output accuracy (with error estimation) and the query execution budget; (iii) Latency: near real-time stream processing based on a scalable “synchronization-free” distributed architecture.
The key idea behind our approach is to marry two techniques together, namely, sampling (used for approximate computation) and randomized response (used for privacy-preserving analytics). The resulting marriage is complementary—it achieves stronger privacy guarantees, and also improves the performance for stream analytics.

Fusion von HPC und Data Analytics (HPC-DA)

Das ZIH erweitert seinen Hochleistungsrechner um Systemkomponenten für die Analyse komplexer großer Datenmengen. Die Erweiterung bietet den Forschenden mehr als 2 Petabyte Flash-Speicher mit einer Bandbreite von etwa 2 Terabyte/s, wobei der Flash-Speicher flexibel konfigurierbar ist und an allen vorhandenen ZIH-Rechenknoten genutzt werden kann. Für große Datenvolumen wird daneben ein Objekt-Speicher von 10 Petabyte bereitgestellt. Beide Lösungen werden von der Firma NEC geliefert. Als Schnittstelle von HPC und Data Analytics bietet „HPC-DA“ skalierbare virtuelle Forschungsumgebungen, die auf die Anforderungen der Anwender/-innen zugeschnitten sind. Die Rechenkapazität wird um 22 Power-9-Knoten der Firma IBM mit jeweils sechs Nvidia-V100-GPUs erweitert, die an die genannten Speichersysteme angebunden werden und damit eine der momentan leistungsfähigsten Machine-Learning-Infrastrukturen in Deutschland zur Verfügung stellen. Insgesamt bietet das System damit die Möglichkeit, verschiedene Technologien flexibel zu effizienten und individualisierbaren Forschungsinfrastrukturen zu kombinieren. Die Installation wird Nutzer/-innen aus ganz Deutschland offenstehen, deren HPC- und Big-Data-Anwendungsfälle in besonderer Weise von HPC-DA profitieren können. Der Produktionsbetrieb beginnt ab Ende 2018; Projektanträge können über das Antragsportal des ZIH eingereicht werden. (Ansprechpartner: Dr. Ulf Markwardt, Tel.: 0049-351-463-33640)

The research paper Using Link Features for Entity Clustering in Knowledge Graphs has received the Best Research Paper Award of the 15th Extended Semantic Web Conference (ESWC) held in June 2018 in Heraklion, Greece. The paper describes the CLIP algorithm for entity clustering that substantially outperforms previous approaches and that can also be applied for repairing entity clusters. CLIP has been added to the FAMER tool, a system for parallel multi-source entity resolution based on Apache Flink. The awarded paper is authored by Alieh Saeedi, Eric Peukert and Erhard Rahm from the database group Leipzig and the Big Data Center ScaDS; Alieh presented the paper at the conference. The ESWC 2018 research track had 31 papers selected from 132 submissions so that the Best Research paper award represents a significant distinction.