Data protection is a hot topic and a source of debate. Unfortunately, this sometimes gives rise to myths. In this article, I would like to get to the bottom of one of these myths.
When it comes to data protection, everyone has their own opinion - I've probably witnessed this myself in almost a hundred conversations over the last few years. Unfortunately, in these conversations it happens again and again that generalizing statements are made or opinions are presented as fact.
Especially at the beginning of my research into data protection, I was often unsettled by this. Over time, however, I realized that some of the statements were not tenable - they were often just personal opinions that were simply presented with a good dose of self-confidence. So I did more research and sometimes (admittedly not always) came up with different results.
One of my principles developed from this: "If everyone says that something is not possible, don't just believe it, but question it." In other words, if the answer is not meaningful, comprehensible or backed up with facts, convince yourself and find out why it should not be possible. Because if there is no answer, then maybe it is possible after all.
„That's not possible, it's just a pseudonym“
One statement that appears again and again in different variations is that consent-free tracking with a pseudonymized or anonymized identifier is either not permitted or useless. In my opinion, however, this statement is based on a misunderstanding of the definitions of the terms "anonymization" and "pseudonymization" as well as the status of the identifier for the respective party in this consideration. Admittedly, the topic is not entirely trivial - I will therefore present the definitions with specific examples and try to simplify the topic.
Definition: Pseudonymization
"Pseudonymization" means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;"
Source: GDPR, Article 4.5, https://gdpr-info.eu/art-4-gdpr/
This means that if an IP (which is always personal per se) is run through a hash algorithm (e.g. MD5, SHA-X, etc.), a pseudonym of this IP is obtained. If the input value is enriched well enough (IP + user agent + other characteristics), it is currently not possible to extract the original input from the generated hash value (see also recital 26 of the GDPR). De-pseudonymization is only possible with the help of a mapping table (IP A = HASH A | IP B = HASH B | ...) - i.e. by "remembering" which hash belongs to which IP.
Let's look at this with a tracking example:
A user visits the website www.example.com
The user's browser transmits some information to the website operator's web server, including the IP - example: 192.168.179.10
The site operator uses a tracking service provider on his site (e.g. Anogate) with whom there is a data processing agreement.
The user's browser then loads the JavaScript of the tracking service provider and information is also transmitted to it, including the IP - here 192.168.179.10 (Note: I will address the TTDSG/ePrivacy issue separately)
The tracking service provider creates a hash as described above, e.g.
SHA1(192.168.179.10|Mozilla/5.0 (iPhone;[..]|{SITE_HASH}|{DAILY_HASH})
=f89cabde53656358793144520e986323d863f6dd
This is where it gets exciting - let's take a look at who has what data:
The site operator has the visitor's IP, but not the generated hash.
At this moment, the tracking service provider has both the IP and the hash and can assign the values. The hash therefore corresponds to a pseudonym of the IP.
Let's leave the example alone for now and look at the topic of anonymization.
Definition: anonymization
Anonymizing is the changing of personal data in such a way that the individual information about personal or factual circumstances can no longer be assigned to a specific or identifiable natural person or can only be assigned to a specific or identifiable natural person with a disproportionate amount of time, costs and manpower.
Source: BDSG §3.6, https://dejure.org/gesetze/BDSG_a.F./3.html (german)
Im Wikipedia-Eintrag zum Thema "Anonymisierung und Pseudonymisierung" gibt es ein anschauliches Beispiel für den Unterschied:
There is a clear example of the difference in the Wikipedia entry on the topic of “anonymization and pseudonymization” (german):
Pseudonymization:
"If a professor at a university wants to make the results of a (written) exam easily accessible to students, he asks them to write down a self-selected pseudonym on the papers during the exam. After the correction, the professor can put up a notice (if necessary on the Internet) in which all results are listed according to the <Pseudonym> <Grade> scheme. This means that the assignment of the pseudonym to the respective student can only be established by the professor or, in individual cases, by the student."
vs.
Anonymization:
"If, in the "Professor" example above, the examination sheets with the pseudonyms written down by the students were subsequently destroyed, the information on the grade notice would be anonymized for the general public, as it would no longer be possible to assign it to the respective students. However, every student will "Since he has remembered his pseudonym, we can recognize his entry on the sheet music notice."
In the Wikipedia example you can see that, on the one hand, it depends on the perspective of whether data is considered pseudonymized or anonymized, but above all on whether the assignment is still possible at all.
Pseudonym without assignment = "anonymized pseudonym"
In relation to the original tracking example and therefore also to Anogate, this means that If the tracking service provider removes the IP from its memory directly after generating the hash, no assignment between the IP and the hash can occur. At this point, the states look as follows:
The site operator still only has the IP
However, the tracking service provider now only has the hash, but not the IP.
The hash was therefore only a pseudonym for a millisecond and was immediately anonymized - by preventing or destroying the assignment. From this moment on, the tracking service provider can no longer determine an IP based on the hash. The hash is now an anonymized pseudonym. If there is any other personal data in the tracking event, this must of course also be anonymized. However, if there is no further personal data in the tracking event or all of it has been anonymized accordingly, the GDPR no longer applies.
GDPR and anonymous data
According to recital 26, sentence 5, the GDPR does not apply to anonymous data. Many experts also share this view:
Robin Data: "Was sind anonymisierte Daten?" (german)
DataGuard: "Beispiele für die Pseudonymisierung & Anonymisierung von Daten" (german)
Dr. Datenschutz: "Ein Überblick zur Anonymisierung" (german)
Dr. DSGVO: "Anonymisierung von Daten und Datenschutz: Was bedeutet das und welche Rechtsgrundlagen sind relevant?" (german)
However, it is controversial whether the process of anonymization itself constitutes data processing. From the BfDI's perspective, this is the case - see position paper (german) on the consultation process from June 29, 2020 - the industry association bitkom, in turn, saw this differently in the BfDI consultation (see statement). The courts will probably ultimately have to clarify which point of view is “the right one”.
However, if the BfDI's interpretation were to remain the same, i.e. the process of anonymization already constitutes data processing, a legal basis would be necessary, as for all other processing activities. However, a possible legal basis for this could be Article 6 Paragraph 1 f of the GDPR "legitimate interest" in combination with recital 50, sentence 4, which provides for a lawful processing operation for statistical purposes ("[]..Further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes should be considered to be compatible lawful processing operations.[..]"
Once the data has been securely anonymized, nothing stands in the way of further processing.
Anogate removes personal data
The idea behind Anogate is to remove the personal data from the tracking data, leaving only the (anonymized) event data. Because this is exactly the data that is relevant for most marketers (see “How it all began”). The automated removal of personal data is the challenge we face. To do this, we use various algorithms but also AI solutions, all with the aim of enabling legally compliant use of your tracking even without consent, so that you receive more and more reliable data.