Privacy Preserving Record Linkage
Linking person related data adds a new challenge to the two ones already presented, privacy. Such data, like healthcare data shown in the example of Fig.1 or financial data, contain highly sensitive information about people and their process is generally restricted by law. Therefore, new methods and technique must be integrated in the standard linkage process to overcome this challenge.
Privacy Preserving Record Linkage follows almost the same steps described above. However, several modifications are introduced to fit in the privacy context (shown in red in Fig.3). We start by presenting which kind of modification is needed in each step to preserve privacy. Practical realizations of these modifications will be presented later on (some in this post and others in future posts).
At the beginning, in the preprocessing step, the cleaned and normalized records must be encoded/encrypted by the data owners. Encryption techniques are discussed later on. The main challenge when encoding records is to make them anonymous while preserving their possible similarity. In the blocking phase new techniques must be developed because traditional methods that use (part of) fields may disclose the identity of the person. Generally the same techniques used to compare clear text records are used for encoded ones, however not all classification methods can be used. For example supervised learning methods need a ground truth that is not always available in PPRL due to privacy concerns. The evaluation of result’s quality is even more challenging due to the lack of golden standard.
Note that the steps shown in Fig. 3 present a general workflow of PPRL. Depending on the followed protocol (explained below) some steps might be differently executed or even missing.