ScaDS Logo

COMPETENCE CENTER
FOR SCALABLE DATA SERVICES
AND SOLUTIONS

Entity Resolution (also known as Deduplication, Record Linkage, Link Discovery) refers to the task of identifying entities, which refer to the same real-world entity. Entities are usually matched by determining the similarity between them and this similarity is then used to determine if the entities are the same. With a plethora of different similarity measures and possibilities of combining them, creation good match conditions can be a cumbersome process of trial and error. This is why machine learning approaches are used to aid in this process.

This bachelor thesis consists of integrating the decision-tree based DRAGON algorithm into FAMER(FAst Multi-source Entity Resolution system), a scalable framework for distributed multi-source entity resolution implemented with Apache Flink™ .

Contact: Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!