B.Sc. Thesis: Scalable and Accurate Decision-Tree Learning for Entity Resolution
Entity Resolution (also known as Deduplication, Record Linkage, Link Discovery) refers to the task of identifying entities, which refer to the same real-world entity. Entities are usually matched by determining the similarity between them and this similarity is then used to determine if the entities are the same. With a plethora of different similarity measures and possibilities of combining them, creation good match conditions can be a cumbersome process of trial and error. This is why machine learning approaches are used to aid in this process.
This bachelor thesis consists of integrating the decision-tree based DRAGON algorithm into FAMER(FAst Multi-source Entity Resolution system), a scalable framework for distributed multi-source entity resolution implemented with Apache Flink™ .